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Abstract. 

We  present  expressions  for  absolute  and  relative  errors  in  individual  components  of  the  so¬ 
lution  to  systems  of  linear  equations.  We  consider  three  kinds  of  linear  systems:  non-singular, 
underdetermined  of  full  row  rank,  and  least  squares  of  full  column  rank.  No  assumptions  regarding 
the  structure  or  distribution  of  the  perturbations  are  required. 

Our  expressions  for  component-wise  relative  errors  allow  the  following  conclusions:  For  any 
linear  system  there  is  at  least  one  solution  component  whose  sensitivity  to  perturbations  is  propor¬ 
tional  to  the  condition  number  of  the  matrix;  but  -  depending  on  the  relation  between  right-hand 
side  and  matrix  -  there  may  exist  components  that  are  much  better  conditioned.  For  a  least  squares 
problem,  the  sensitivity  of  the  components  also  depends  on  the  right-hand  side  and  may  be  as  high 
as  the  square  of  the  condition  number.  Least  squares  problems  are  therefore  always  more  receptive 
to  ill-conditioning  than  linear  systems. 

In  addition,  we  show  that  the  component- wise  relative  errors  for  linear  systems  are  reduced  by 
column  scaling  only  if  column  scaling  manages  to  reduce  the  perturbations.  Regarding  underde¬ 
termined  linear  systems  of  full  column  rank,  the  problem  of  finding  the  minimal-norm  solution  can 
be  formulated  so  that  the  same  analysis  as  for  least  squares  problems  is  applicable  here  as  well. 

Finally,  we  define  component-wise  condition  numbers  that  measure  the  sensitivity  of  the  so¬ 
lution  components  to  perturbations.  They  have  simple  geometric  interpretations  and  can  be  com¬ 
puted  and  estimated  as  efficiently  as  the  conventional  condition  numbers. 
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1  Introduction 


Most  people  would  probably  believe  that  there  is  nothing  left  to  be  done  when  it  comes  to  error 
analysis  for  the  solution  of  linear  systems  of  equations  and  linear  least  squares  problems,  especially 
where  perturbation  analysis  without  regard  to  a  particular  algorithm  is  concerned.  So,  why  yet 
another  paper  on  the  subject? 

We  want  to  demonstrate  that  a  careful  perturbation  analysis  is  capable  of  providing  a  realistic 
assessment  of  the  error  and  reliable  measures  of  the  sensitivity  of  the  solution  to  perturbations  in 
the  data. 

In  particular,  we  derive  expressions  for  the  errors  in  individual  components  of  the  solution  vector. 
These  expressions  give  rise  to  realistic  and  efficiently  computable  error  bounds.  The  derivations  of 
the  error  expressions  require  no  restrictions  on  the  structure  or  distribution  of  the  perturbations. 
Without  any  knowledge  of  the  underlying  algorithm,  we  can  therefore  obtain  a  great  deal  of  infor¬ 
mation  about  the  sensitivity  of  individual  solution  components  to  perturbations  in  the  data  -  much 
more,  in  fact,  than  what  is  provided  by  conventional  perturbation  results. 


1.1  Motivation 


Consider  the  solution  of  a  system  of  linear  equations  Ax  =  b  with  non-singular  coefficient  matrix  A. 
The  computed  solution  x,  which  is  usually  different  from  the  true  solution  x,  can  be  viewed  as 
the  true  solution  to  a  perturbed  system  ( A  +  F)x  =  6  +  /.  Let’s  assume  we  do  not  know  which 
algorithm  was  used  for  the  computation  of  x,  so  we  have  no  knowledge  about  the  structure  of  the 
perturbations  F  and  /. 


Only  very  infrequently,  e.g.  [4,  15],  does  one  try  to  assess  the  error  in  individual  solution 
components.  The  conventional  way  of  assessing  the  error  in  x,  as  compared  to  the  true  solution  x, 
is  to  estimate  an  upper  bound  on  the  norm-based1  relative  error  ||x  —  x||/||x||.  The  most  commonly 
used  first-order  bound  is 

where  the  condition  number  «(/l)  =  ||A||  |j  A_1||  >  1  acts  as  an  amplifier  for  the  relative  perturbations 
in  the  data  pA  =  ||,F||/|jA||  and  pj  =  ||/||/||6||.  This  norm-based  bound  has  led  to  a  rule  of  thumb: 
If,  for  instance,  k(A)  is  about  103,  and  the  size  of  the  relative  perturbations  is  about  10-7,  then  the 
computed  solution  x  can  be  expected  to  be  accurate  to  7  —  3  =  4  significant  digits. 


In  many  situations  this  type  of  error  assessment  is  just  fine  -  unless,  however,  the  individual 
components  of  the  solution  have  physical  significance  as,  for  example,  in  statistical  applications  [21]. 
Consider  the  linear  system  Ax  =  6,  where 


A-{o  l)’  6-(o)’  *~(o) 


/or 


Suppose  the  computed  solution  is  x  =  ^  |  ^ ,  where  e  is  a  very  small  positive  number.  Then  x  can  1 
be  viewed  as  the  true  solution  to  the  perturbed  system 


□ 

□ 


a+f=a={1  ;),  >+/=(;). 


ft&tL 

n/ 


In 


’The  following  inequalities  hold  for  any  vector  p-norm  and  induced  matrix  norm;  see  Section  2  in  [12],  for  instance.  ,  . 

•  I  is  paper  we  use  the  two-norm.  -  * 
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Because  A  is  the  identity  matrix,  >c(A)  =  1,  and  the  above  error  bound  tells  us  that  ||£  —  x||/||x||<£. 
So  the  error  in  the  solution  seems  to  be  no  more  than  the  error  in  the  data,  which  is  all  we  are 
entitled  to.  However,  the  second  component  of  the  computed  solution  has  component-wise  relative 
error2 

*2  ~  x2  _  (  ~  0  _  j 

*2  ~  « 

and  is  thus  totally  wrong.  Therefore,  a  small  bound  on  the  norm-based  error  does  not  guarantee 
accuracy  in  individual  components  of  the  computed  solution. 


Of  course,  you  could  argue  now  that  this  should  have  been  anticipated.  Since  is  zero,  hence 
small  in  magnitude,  one  should  not  expect  to  compute  it  correctly  in  the  first  place.  Accordingly, 
we  could  account  for  it  by  estimating  the  error  in  each  component  Xi  of  the  computed  solution  via 


<  T(P/l  +  Pi). 

|x«| 


provided  x <  ^  0.  The  amplifiers  for  the  relative  perturbations  are  now  the  condition  number,  as 
well  as  the  size  of  an  individual  component  relative  to  the  whole  solution.  This  modification  yields 
a  correct  assessment  for  the  errors  in  individual  solution  components  of  the  above  example. 


Unfortunately,  we  have  not  really  fixed  the  problem.  The  condition  number  k(A)  can  still 
severely  over  estimate  the  error  in  some  solution  components,  as  the  following  4x4  linear  system 
demonstrates. 


/ 

0.4919 

0.1112 

-0.6234 

-0.6228  \ 

/ 

0.4351  \ 

-0.5050 

-0.6239 

0.0589 

0.0595 

A  — 

-0.1929 

0.5728 

-0.0843 

0.7480 

0.7483 

,  0  — 

0.6165 

\ 

-0.4181 

0.7689 

0.2200 

0.2204  j 

\ 

-0.8022  / 

The  first  three  columns  of  A  are  nearly  orthogonal  while  the  last  two  columns  are  almost  identical. 
Both  the  two-norm  condition  number  k2(A)  and  Skeel’s  condition  number  [19]  are  larger  than  103. 
Note  that  the  matrix  is  not  ill-scaled. 


But  the  ‘component-wise  condition  numbers’  that  we  will  introduce  in  this  paper  turn  out  to  be 

<1.1,  <1.1,  >  103,  >103. 

This  means  that  the  first  two  components  of  x  are  well-conditioned  and  the  remaining  two  are 
ill-conditioned,  regardless  of  the  perturbations.  To  illustrate  this,  compare  the  ‘exact’  solution 
computed  with  16-digit  arithmetic 

xT  = (1.000075414240576  -.5000879795933286  -.0242511388797165  .02624513955005858), 

with  the  solution  computed  with  4-digit  arithmetic,  which  can  be  viewed  as  the  solution  to  a  per¬ 
turbed  problem, 

xT  =  (  1.000  -.5003  -.0589  .06090). 

As  oredicted  by  our  component-wise  condition  numbers,  the  first  two  components  are  accurate  to 
almost  four  digits,  whereas  the  last  two  have  no  accuracy  whatsoever.  As  far  as  we  know  no  other 
existing  condition  numbers  can  predict  the  well-conditioning  of  the  first  two  components  of  this 
system. 

Therefore,  the  conventional  norm-based  bounds  are  apparently  not  able  to  estimate  the  accuracy 
of  individual  components  correctly.  We  hope  to  have  now  provided  enough  motivation  for  the  need 
to  study  component-wise  relative  errors  and  the  sensitivity  to  perturbations  of  individual  solution 
components. 

2 Whenever  x,  =  0  while  r,  ^  0,  the  component- wise  relative  error  has  x,  instead  of  x,  in  the  denominator. 
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1.2 


Overview 


Given  a  linear  system  Ax  —  b  of  full  column  rank  and  a  perturbed  system  (A  +  F)x  —  b  +  /,  we 
derive  expressions  for  the  error  in  individual  components  of  the  computed  solution  x .  Our  work  is 
more  general  than  that  of  Skeel  [19]  on  component-wise  perturbations  and  that  of  Stewart  [22]  on 
stochastic  perturbations  because  we  make  no  assumptions  about  the  perturbations  F  and  /,  either 
their  size,  structure  or  distribution. 

In  particular,  we  show  that  there  is  always  one  component  of  the  solution  vector  whose  sensitivity 
to  relative  perturbations  is  proportional  to  the  condition  number  of  the  matrix;  but  -  depending 
on  the  right-hand  side  -  there  may  exist  components  that  are  much  better  conditioned.  Therefore 
the  conventional  upper  bounds  on  norm-wise  relative  errors  are  as  tight  as  possible,  and  if  they  are 
pessimistic  it  is  because  they  represent  an  inadequate  means  of  measuring  the  error. 

We  derive  condition  numbers  for  individual  components  for  the  solution  of  a  linear  system,  which 
we  call  ‘component-wise  condition  numbers’.  We  thus  associate  with  a  linear  system  Ax  =  b  not  a 
single  condition  number  but  a  set  of  condition  numbers.  Our  work,  although  developed  indepen¬ 
dently,  can  therefore  be  considered  a  continuation  of  Stewart’s  work  on  collinearity  in  regression 
problems  [21].  The  singular  value  decomposition,  often  used  to  determine  the  conventional  condi¬ 
tion  number  of  a  matrix,  provides  a  basis  for  the  column  space  but  does  not  relate  this  basis  to 
the  columns  of  the  matrix.  In  contrast,  Stewart’s  condition  numbers  are  designed  to  expose  the 
most  linearly  dependent  columns  of  a  matrix.  They  are  embedded  in  our  component-wise  condi¬ 
tion  numbers,  whose  purpose  is  not  only  to  recognise  linearly  dependent  columns  but  also  to  reflect 
the  relationship  between  matrix  and  right-hand  side.  We  provide  a  geometric  interpretation  for 
Stewart’s  condition  numbers  and  demonstrate  that  they  are  ‘inherent’  in  the  inverse  of  the  matrix. 

All  of  our  results  also  hold  for  the  solution  of  linear  least  squares  problems  miny  ||Ay  —  6||  of  full 
column  rank.  The  set  of  component- wise  condition  numbers  for  a  least  squares  problem  contains 
those  for  a  linear  system  as  a  subset,  hence  the  sensitivity  of  some  solution  components  may  be  much 
lower  than  the  condition  number.  In  particular,  we  show  that  there  is  a  component  of  the  solution 
vector  whose  sensitivity  to  relative  perturbations  equals  at  least  the  product  of  condition  number 
and  tan  0,  where  6  is  the  angle  between  the  right-hand  side  and  the  column  space  of  the  matrix;  the 
sensitivity  can  be  as  high  as  the  product  of  tan#  and  the  square  of  the  condition  number.  Least 
squares  problems  are  therefore  always  more  receptive  to  ill-conditioning  than  linear  systems. 

In  addition,  we  show  that  the  component-wise  relative  errors  for  linear  systems  are  reduced  by 
column  scaling  only  if  column  scaling  manages  to  reduce  the  perturbations.  Regarding  underdeter¬ 
mined  linear  systems  of  full  column  rank,  the  problem  of  finding  the  minimal-norm  solution  can  be 
formulated  so  that  the  same  analysis  as  for  least  squares  problems  is  applicable. 

The  expressions  for  the  errors  in  the  solution  of  least  squares  problems  and  underdetermined 
linear  systems  can  be  used,  for  instance,  to  obtain  perturbation  results  for  the  computation  of  left 
and  right  inverses  of  matrix. 

In  Section  2  we  present  the  basic  ideas  contained  in  this  paper.  We  derive  them  from  first 
principles,  keeping  technical  details  to  a  minimum.  Section  3  and  Appendix  2  contain  a  detailed 
perturbation  theory  for  the  solution  of  linear  systems  of  full  column  rank,  and  Section  4  extends 
it  to  the  solution  of  least  squares  problems  of  full  column  rank.  The  treatment  of  full  rank  least 
squares  problems  is  extended  to  the  solution  of  underdetermined  linear  systems  of  full  row  rank 
in  Section  5.  In  Section  6  we  discuss  the  efficient  computation  and  estimation  of  component-wise 
condition  numbers.  In  particular,  we  show  how  to  compute  them  via  updating  QR  decompositions, 
and  how  to  estimate  them  by  means  of  conventional  condition  numbers  estimators.  A  short  summary 
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of  the  paper  is  followed  by  Appendix  1,  where  expressions  for  the  left-inverse  of  a  matrix  are  derived 
in  order  to  justify  our  choice  of  condition  numbers  as  a  natural  measure  of  sensitivity. 

Although  we  concentrate  on  component-wise  relative  errors,  expressions  for  component-wise  ab¬ 
solute  errors  are  also  included;  the  corresponding  condition  numbers  can  be  computed  as  easily  as 
those  for  conventional  norm-based  errors. 


1.3  Summary  of  Notation 


We  give  a  brief  summary  of  frequently  used  notation  for  easy  reference.  This  notation  is  also 
introduced  in  the  body  of  the  paper  whenever  it  appears  for  the  first  time. 


The  norm  ||  •  ||  represents  the  two-norm,  and  ej  stands  for  the  tth  column  of  the  identity  matrix  I, 
whose  order  will  be  clear  from  the  context.  The  column  space  of  a  matrix  A,  {c  :  Ax  =  c},  is 
represented  by  H(A)  and  its  nullspace,  {x  :  Ax  =  0}  by  Ker(A).  The  subspace  in  real  n-space 
that  is  orthogonal  to  the  space  span{vi , . . . ,  «k}  spanned  by  n  x  1  vectors  t>i ,  . . .,  v*  is  denoted  by 
spanx{vj, . . . ,  Ufc}. 


The  columns  of  a  n  x  m  matrix  A  are  denoted  by  a,,  and  if  A  is  of  rank 
left-inverse  A*  are  denoted  by  rf, 


m  the  rows  of  its 


A  =  (al  ...  am ) , 


At  = 


The  singular  value  decomposition  (SVD)  of  a  n  x  m  matrix  A,  n  >  m,  is  represented  as  A  = 
UHV7 ,  where  isanxn  orthogonal  matrix,  V  is  a  mxm  orthogonal  matrix,  and  the  mxn  diagonal 
matrix  E  has  as  its  diagonal  elements  the  singular  values  of  A  in  descending  order  <?i  >  . . .  >  crm  >  0. 
The  two-norm  condition  number  of  a  full-rank  matrix  A  is  denoted  by  k(A)  =  ||A||  ||A^||. 

If  x  solves  the  least  squares  problem  miny  ||  Ay  —  6||  then  the  residual  is  denoted  by  r  =  b  —  Ax. 


2  The  Basic  Ideas 


We  start  out  by  illustrating  the  ideas  that  led  us  to  pursue  a  component-wise  perturbation  analysis; 
this  is  done  by  studying  perturbations  in  the  right-hand  side  only.  We  also  restrict  ourselves  to  the 
solution  of  full-rank  least  squares  problems  until  Section  5  where  the  results  are  extended  to  the 
solution  of  underdetermined  linear  systems  of  full  row  rank. 

As  for  notation,  |j  •  ||  represents  the  two-norm,  and  e,-  stands  for  the  ith  column  of  the  identity 
matrix  l. 


2.1  Motivation 

The  first  theorem  gives  a  simple  geometric  interpretation  of  the  components  of  the  solution  x  to  a 
full-rank  least  squares  problem  miny  ||At/  —  6||.  An  individual  solution  component  can  be  expressed 
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as  a  product  of  three  factors:  the  length  of  a  row  in  the  left-inverse  A\  the  length  of  the  right-hand 
side  and  the  angle  between  the  two. 


Theorem  1  Given  a  n  x  m  matrix  A  of  rank  m,  denote  by  rj  the  rows  of  its  left-inverse  A 


A*  =  (ATA)-lAT  =  I  :  I 

V*/ 


Then  the  components  n  of  the  solution  x  to  the  least  squares  problem  minv  ||Ay  —  6||  are  given  by 

Xi  =  rfb  =  1 1 r,- 1 1 1|6||  cos  pi,  1  <  *  <  m, 
where  Pi  is  the  angle  between  r,-  and  b. 


Proof:  The  vector  x  solves  miny  || Ay  —  6|j  if  and  if  only  it  solves  the  normal  equations  AT  Ax  =  ATb. 
So  x  —  At  b,  which  implies  x<  =  rfb  =  |Jr,  ||  ||6|j  cos  fa,  where  fa  is  the  angle  between  r,-  and  b.  ■ 

Already  in  [20]  Stewart  recognised  the  importance  of  the  1 1 r-,- 1 1  for  the  purpose  of  detecting  almost 
linearly  dependent  columns  in  A.  In  fact,  it  turns  out  that  length  and  angles  associated  with  the  r* 
indicate  the  sensitivity  of  individual  components  of  the  solution  x  to  perturbations  in  the  right-hand 
side. 


Theorem  2  Given  a  matrix  A  of  full  column  rank,  let  x  ^  0  solve  miny  ||Ay  -  6||  and  let  x  solve 
miny  II Ay  — (t +  /))]. 

Denote  by  xbi  the  angle  between  r,  and  f.  Then 

Xi  =  xi  +  rj f  =  n  +  j|rj||  ||/||  cos  V’i- 
If  H  £  0  and  =  ||/||/||6||  then 


-  Xj 
Xi 


cos  fa 


(i  COS  tpi 


ijJjfjjjjj^lWI  INI 


Proof:  According  to  Theorem  1, 

fa  =  rj (b  +  f)  =  rjb+rjf  =  Xi+rjf  =  n  +  ||r<||  ||/||  cos  t l>i, 
where  tpi  is  the  angle  between  and  /.  Since  0  ^  n  =  rfb  =  ||rj||  [|6||  cos we  have 

Xi  -  Xi  rt  I  I  Il/H 


rJl 

rib 


cos  0i  ||6|| 


COS  t/>i . 


The  theorem  states  that  the  absolute  perturbation  ||/||cosV’i  in  —  Xj  is  amplified  by  ||r,||.  In 
the  first  expression  for  the  relative  error,  the  perturbation  cj  cos ipi  is  amplied  by  1/cos Pi.  That 
is,  the  ‘more  orthogonal’  6  is  to  r^,  the  smaller  is  cos  Pi,  and  the  larger  is  the  amplification  of  the 
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relative  perturbation.  Therefore,  the  component-wise  relative  error  is  likely  to  increase,  the  more 
orthogonal  rj  is  to  the  right-hand  side. 

Comparing  the  two  amplifiers  we  see  that  the  amplifier  ||r,||  in  the  absolute  error  only  refers  to 
the  matrix  and  ignores  6,  while  the  amplifier  1/cos  /?,•  in  the  first  expression  for  the  relative  error 
describes  a  relationship  between  the  matrix  and  the  right-hand  side. 

The  second  expression  for  the  relative  error  in  Theorem  2  is  more  conventional  and  perhaps 
easier  to  interpret.  It  consists  of  the  relative  perturbation  f 4  cos  ipi ,  amplified  by  three  factors:  the 
magnitude  of  x,  relative  to  ||z||;  the  term  ||i4||  ||i*j||,  which  describes  the  condition  of  the  matrix  and 
will  be  studied  more  closely  in  Section  2.2;  and  the  term  ||J||1>|g|| ,  which  is  common  to  all  components 
and  describes  the  relation  between  matrix  and  right-hand  side.  If  we  denote  by  /c(A)  =  ||.«4||  ||At|| 
the  condition  number  of  the  matrix  A  then  ||A||  ||rj||  can  be  bounded  by 

1  =  ||ef  ||  =  UefA'AU  <  \\ef  A'\\\\A\\  =  ||A||||r,||  <  k(A ), 

A  lower  bound  for  p^|^|j ,  provided  x  ^  0,  is 

JJ]__  J _ 1  J_ 

||A||||x||-||A||llgll^x(A)- 

In  case  of  a  linear  system  Ax  —  b, 

11*11  _  P*ll 

\\A\\\\x\\-\\A\\\\x\\-  ’ 

otherwise  it  can  be  unbounded  since  b  may  be  almost  orthogonal  to  all  rows  of  A* . 

Therefore,  the  component-wise  relative  error  tends  to  be  large  for  those  components  x,  whose 
size  is  small  in  comparison  to  ||z||,  or  whose  matrix  condition  number  ||A||  ||r,||  is  large,  or  whose 
right-hand  side  is  nearly  orthogonal  to  all  rows  of  A * .  The  three  amplification  factors  in  the  second 
expression  for  the  relative  error  in  Theorem  2  provide  a  clear  separation  of  the  factors  responsible 
for  the  loss  of  accuracy  in  the  computed  solution:  relative  magnitude  of  the  solution  components, 
matrix  condition,  and  relationship  between  matrix  and  right-hand  side. 

In  Sections  3  and  4  we  show  that  the  same  quantities  that  determine  the  sensitivity  to  right-hand 
side  perturbations  also  determine  the  sensitivity  to  perturbations  in  the  matrix.  First,  though,  we 
relate  them  to  more  established  ways  of  measuring  sensitivity. 


2.2  Relation  to  Singular  Values 


The  goal  of  this  section  is  to  compare  the  amplification  factors  for  the  usual  norm-based  errors  with 
those  for  our  new  component-wise  errors. 

Because  the  two-norm  condition  number  k(A)  —  ||A|j  ||A*||  equals  the  ratio  of  the  extreme  singu¬ 
lar  values  of  A,  we  can  relate  the  ||r,  ||  to  the  singular  values  of  A  and  obtain  the  following  well-known 
inequalities. 


Theorem  3  Let  A  be  a  n  x  m  matrix  of  rank  m  with  singular  values  <T\  >  . . .  >  <rm  >  0,  and  denote 
by  rj  the  rows  of  .  Then 


&m  — 


(Tm  <  min  - — -  <  \fmo 

*  IMI 


m  • 


<  <7i> 
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//Ikmaxll  =  max*  ||r*||  then 

m  ||rm«,||  <  IMII  |N||  <  V^IMII  l|rm„||. 


Proof:  The  singular  values  of  the  left  inverse  A*  are  1  /<rt  )  Section  5.5.4  in  [12],  hence  \/a\  <  1 1 r j 1 1  < 
1  / <rm,  giving  the  first  set  of  inequalities. 

Let  A  —  U'£VT  be  the  singular  value  decomposition  of  A.  The  last  row  ef^VT  of  VT  is  a  vector 
with  unit  two-norm  in  3?m ,  so  at  least  one  of  its  components,  say  the  j th,  must  be  of  magnitude 
l/y/m.  Hence  the  jth  row  rj  of  A*  satisfies 

INI  =  ||Cf£(ET£)-1  VTej||  =  ||E(ETE)-lVTei||  >  le^£(£T£)-1l/Tei|  >  -L-L, 

y/Tn  (Tm 


yielding  the  second  set  of  inequalities. 

The  last  set  of  inequalities  comes  from  j]rmax||  <  ||Af  ||  =  l/crm-  ■ 


Applying  Theorem  3  to  the  second  expression  for  the  component-wise  error  in  Theorem  2  shows 
that  there  must  exist  a  component  it  for  which 


\xt  -  xt  1 

IN 


>  4=,,  'III,1  ,,  f&|cos^fc|. 

V™  Pll  11*11  IN 


Therefore,  the  sensitivity  of  xt  to  right-hand  side  perturbations  is  proportional  to  the  condition 
number  of  A  whenever  the  right-hand  side  has  an  appropriate  direction,  that  is,  whenever  p|>|[r|| 
is  not  too  small 


We  briefly  take  a  closer  look  at  this  last  condition.  When  Ax  =  b  and  6  is  a  singular  vector 
associated  with  the  smallest  singular  value  ttn,  of  A,  p»ft||/||6||  =  l/*m  =  Pf||,  then 


m  =  i 
INI  11*11  k(A)' 


and 


INI  Ml 


Pll  INI  = 


INII 


<  l. 


According  to  the  expressions  for  the  errors  in  Theorem  2,  the  sensitvity  of  all  solution  components 
to  right-hand  side  perturbations  is  then  solely  determined  by  their  relative  magnitude. 


The  existence  of  a  row  of  Af  whose  norm  approximates  \/am  well,  as  evidenced  by  Theorem  3, 
underlies  the  rank-revealing  QR  factorisations,  which  first  appeared  in  [11,  13],  and  are  further 
analysed  and  refined  in  [20,  10,  6,  21].  In  the  simplest  case,  the  goal  of  a  rank-revealing  QR 
factorisation  is  to  determine  the  most  linearly  dependent  column  of  a  matrix  A.  To  this  end  one 
performs  the  QR  factorisation  AP  =  QR,  where  Q  has  orthonormal  columns,  R  is  upper  triangular 
and  the  permutation  matrix  P  is  chosen  so  as  to  minimise  the  trailing  diagonal  element  ( R)mm  of  R. 
Then  the  inverse  of  this  element,  l/|(R)mm|  =  ||e^ 1 1|  =  ||r^ ||,  is  as  large  as  possible,  and  thus 
close  to  1  /<rm. 


While  Theorem  3  states  that  at  least  one  jjr-j  ||  approximates  the  smallest  singular  value  well,  the 
following  corollary  indicates  that  each  ||r,|j  cannot  stray  too  far  away  from  some  singular  value. 


Theorem  4  Let  A  be  a  n  x  m  matrix  of  rank  m  wtth  singular  values  <r\  >  . . .  >  <7m  >  0,  and  lei 
P||f  =  j  denote  the  Frobenius  norm  of  A. 
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1}  the  1 1 r j 1 1  are  ordered  by  increasing  norm,  11*7,1)  <  . . .  <  ||rjm||,  then 


Eiwt2  =  P,ii?-  = 

•si 


1  <  k  <  m  —  1. 


Proof:  The  equality  results  from  the  invariance  of  the  Frobenius  norm  under  orthogonal  transfor¬ 
mations,  Section  2.5.3  in  [12]. 

The  inequalities  are  obtained  by  applying  the  proof  of  Theorem  4.3.26  in  [17]  to  the  singular 
values  of  A'.  ■ 


Remark  1  It  is  important  to  realise  that  the  looseness  of  the  inequalities  in  Theorem  3  depends  on 
how  close  the  right  singular  vector  matrix  of  A  is  to  a  permutation  matrix:  if  A  —  UTjVt  is  the 
SVD  of  A  then 

INI  =  ||f/£(£T£)-1V'Te.-||  =  ||£(£Ti:)-1V'Tei||. 

Thus,  if  V  is  a  permutation  matrix  (this  includes  diagonal  matrices)  then  we  can  find  indices  that 
achieve  the  bounds  in  Theorem  3  since  ||r.||  =  1  /cr*  for  some  k. 


2.3  Conventional  Error  Bounds 


In  this  section  we  present  a  rather  unconventional  way  of  deriving  bounds  on  the  norm-based  relative 
error,  by  making  use  of  the  theorems  from  the  previous  sections. 

An  expression  for  the  absolute  norm-based  error  in  the  infinity-norm  is  available  from  Theorem  2, 

p-x||oo  =  max{||r,||l|/|||cos^,|}. 

Dividing  this  by  ||a:||  results  in  a  mixed-norm  relative  error 

=  nV“MW  Pwi'‘|c“w)' 

where  =  ||/||/||6||.  Denoting  by  k(A)  =  ||i4||  ||i4^|j  the  condition  number  of  A,  we  obtain  an  upper 
bound  for  the  norm-based  relative  error  from  Theorem  3, 


I*  ~*ll  .  /-II* -*1 

ii —H  —  Vm  , 


<  y/mK(A)  f6- 


In  case  of  a  linear  system  Ax  =  b,  ||6||  <  j|A||  j|r||  and  the  bound  simplifies  to 

<  Vm«(A)f6. 

In  this  last  form,  the  upper  bound  agrees  with  the  conventional  bounds.  Its  amplification  factor  for 
the  perturbations  consists  of  the  condition  number  «(A)  of  the  matrix  but  ignores  the  relationship 
between  matrix  and  right-hand  side. 


Theorem  3  also  comes  in  handy  for  the  derivation  of  the  lower  bound 


maxflMIIIM 


Ml 


11-41111*11 


€6 1  COS  | }  >  -^=K(A) 


mu 
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Figure  1:  Angles  Associated  with  Columns. 


where  n  =  max,  {||ri||  |  cos  ^|}/ max*  ||rk||. 


To  summarise,  we  have  derived  lower  and  upper  bounds  on  the  norm-wise  relative  error  for 
perturbations  restricted  to  the  right-hand  side, 


< 


<  \/mK(A) 


11*11  f 

Pll  11*11  *' 


In  the  absence  of  knowledge  about  the  value  of  cos  ip,  we  have  to  assume  the  worst  case  fi  =  1,  which 
implies  that  the  norm-based  error  bound  is  tight.  Therefore  the  conventional  upper  bounds  are  as 
good  as  possible  -  given  that  one  has  chosen  to  measure  a  norm-based  error.  We  have  therefore 
shown  that,  if  the  norm-wise  bounds  give  unsatisfying  information,  it  is  not  because  the  bounds  are 
loose  but  rather  because  an  unsatisfying  way  of  measuring  the  error  was  adopted  in  the  first  place. 


When  Ax  —  b  and  6  is  a  singular  vector  associated  with  the  smallest  singular  value  trm  of  A, 
Pf  WII  =  l/<rm  =  ||At||,  then  ||A||  ||*||/||6||  =  «(A)  and 

1  ^  II*- *11  ^  r= 

-  11*11  " 

In  this  special  case  the  norm-wise  relative  error  is  of  about  the  same  magnitude  as  the  perturbation 
in  the  right-hand  side  and  does  not  depend  on  the  condition  number  of  A,  an  observation  already 
made  by  Chan  and  Foulser  [7]. 


2.4  Geometric  Interpretation 


We  have  seen  so  far  that  individual  components  of  the  solution  z  to  a  full-rank  least  squares  problem 
minv  ||  Ay  —  6||  can  be  expressed  as  x*  =  ||r'a-||  j|6||  cos  ft,  where  rf  is  the  *th  row  of  A*  and  ft  is  the 
angle  between  r,  and  6;  that  ||rj||  and  1/cosft  determine  the  sensitivity  of  x,  to  perturbations  in  6; 
and  that  at  least  one  l/||r;  ||  approximates  the  smallest  singular  value  of  A  well. 

Now  we  want  to  give  a  geometric  interpretation  of  the  ||r,||  in  terms  of  the  columns  in  the  original 
matrix  A.  This  will  allow  us  to  determine  how  exactly  the  linear  independence  of  the  columns  of  A 
and  their  relationship  to  b  affects  the  sensitivity  of  individual  solution  components  to  perturbations. 

As  for  notation,  the  column  space  of  a  matrix  A  is  represented  by  7J(A)  and  its  nullspace  by 
Ker(A).  The  subspace  in  real  n-space  3?"  that  is  orthogonal  to  the  space  span{vi, . . .,  u*}  spanned 
by  n  x  1  vectors  tii,  . . .,  t>t  is  denoted  by  span^lvj , . . . ,  v*}. 

We  first  show  that  the  size  of  the  ||r,||  reflects  the  linear  dependence  of  the  ith  column  of  A  on 
all  others. 


Theorem  5  Given  a  n  x  m  matrix  A  of  rank  m,  denote  by  a*  its  columns,  and  by  rj  the  rows  of 
its  left-inverse  A* , 


A  =  (ai  ...  am  ) ,  A*  =  (ATA)~1AT  = 
Then  K((A')T)  =  H(A)  and 


r;  = 


a,-  cos  oti 


1  <  i  <  m, 


where  —^ir<ai<~irtsthe  angle  between  n  and  a<. 


Proof:  Because  A  has  full  column  rank,  AT A  is  non-singular,  and  Ax  =  A(ArA)_12  =  (Al)T2, 
where  z  —  AT Ax,  which  implies  that  7J((At)T)  =  R(A). 

The  ith  diagonal  element  of  7  =  AM  satisfies  i  =  rj a,  —  ||r,||  Jja,)| cosa,,  where  a,  is  the  angle 
between  r,  and  a,  .  Hence  cos  a,-  >  0,  so  —  %ir  <  a,-  <  and  |NI  =  .  ■ 


Because  ej  =  rj  A,  r,  is  orthogonal  to  all  columns  of  A  except  for  Oi,  that  is  r<  G  span^^a*}, 
see  Figure  1.  Theorem  11  and  Corollary  5  of  Appendix  1  show  that  the  ith  row  rj  of  A *  has  the 
same  direction  as  the  residual  in  the  least  squares  approximation  of  column  a,  by  the  remaining 
columns:  if  Ai  contains  all  columns  of  A  except  for  a;  then 


1 


1 


rT  -  e?A »  = _ - _ — 


a,  ||  cosa*  ||a,| 


where  —a,  =  AiZ  -  a,  is  the  residual  for  the  solution  z  to  the  least  squares  problem  min,,  ||A,y  —  a,  ||. 
In  other  words,  dr  is  the  projection  of  m  onto  the  orthogonal  complement  of  7 2(A,),  and  r,  has  the 
same  direction  as  a 


With  regard  to  the  length  of  it  follows  that 


pili  cos  a,- 

This  means,  the  better  the  remaining  columns  Ai  approximate  a,-  the  smaller  is  the  residual  p,j| 
and  the  larger  is  ||r,||-  That  is,  the  more  linearly  dependent  a,  is  on  the  other  columns,  the  larger 
is  llr-iU- 

The  relationship  between  the  length  of  r,-  and  the  norm  of  the  residual  is  already  known.  In  [21] 
Stewart  uses  a  different  argument  to  show  that 

Pill  =  min||A,y-a,||  = 

»  INI 

Our  contribution  here  is  to  provide  more  justification  for  the  choice  of  r,  as  an  indicator  of  sensitivity. 
Because  r,  is  a  multiple  of  the  residual  an,  the  residual  is  inherent  in  A  and  thus  represents  a  most 
natural  choice  for  sensitivity  measure. 

Our  geometric  interpretation  of  the  rows  of  the  left-inverse  justifies  the  use  of  rank-revealing  QR 
factorisations  to  determine  the  most  linearly  dependent  column  of  a  matrix.  If  the  permutation 
matrix  P  for  the  QR  factorisation  AP  =  QR  is  chosen  so  that  the  trailing  diagonal  element  |(/Z)mm| 
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of  R  is  minimal,  then  the  residual  l/||r^|j  =  l/||e^/2~1||  =  |(/?)mm|  is  minimal.  This  implies  that 
the  last  column  of  AP  is  the  column  that  can  be  best  approximated  by  all  other  columns  and  so  is 
the  most  linearly  dependent  among  all  columns. 


The  individual  components  of  the  solution  x  to  a  least  squares  problem  miny  [[Ay  —  6||  can  be 
expressed  as 


Xi 


INI  H&ll  cos/?t- 


[|6(1  cos  & 
IN  1 1  COS  or,-  ‘ 


The  denominator  of  x,  indicates  the  linear  dependence  of  column  a,  on  all  others,  while  the  numerator 
indicates  the  contribution  of  the  right-hand  side  6  in  span^,{at}.  In  detail,  for  fixed  6,  the  smaller 
the  contribution  of  a<  outside  the  space  spanned  by  the  other  columns,  the  larger  is  x,-.  Or,  the 
smaller  the  contribution  of  a,-  outside  the  space  spanned  by  the  other  columns,  the  more  x,  has  to 
make  up  for  the  weakness  of  a,  in  the  direction  span^^a*}.  Moreover,  the  shorter  aj  is,  the  larger 
x,  has  to  be  because  it  has  to  make  up  for  the  shortness  of  a.i . 


We  can  also  apply  the  geometric  interpretations  to  the  errors  resulting  from  perturbations  /  in 
the  right-hand  side.  The  expression  for  the  absolute  error  from  Theorem  2, 

r  -r  -  ll/ll  cos  ^ 

*  INI  cos  a,’ 

contains  a  large  amplification  factor  |[g||^a  if  column  a,  is  short  or  lies  almost  in  the  space  spanned 
by  the  other  columns.  The  relative  error 


Xi  -  Xi 


COS  Pi 


tb  cos  rpi 


contains  a  large  amplification  factor  1/cos  Pi  if  6  lies  almost  in  the  space  spanned  by  the  other 
columns  or  in  Ker(AT)  =  Ker(AJ)  (in  the  latter  case  the  right-hand  side  of  the  normal  equations  is 
zero).  Note  that  the  amplification  factor  for  the  absolute  error  only  reflects  the  linear  independence 
of  the  matrix  columns,  yet  ignores  their  relation  to  the  right-hand  side. 


2.5  Implications  for  Column  Scaling 

A  diagonal  column  scaling  D  of  the  least  squares  problem  min,,  ||Ay-6||  to  minz  ||(AD)z  —  6||,  where 
D  =  (djj)  is  a  non-singular  diagonal  matrix,  changes  only  the  lengths  of  the  columns  but  not  the 
angles,  so 

z_  _  WbW  cos  Pj 

||d,-,a,  ||  cos  a; ' 


In  case  of  a  column  equilibrated  matrix  AD,  Section  3.5.2  in  [12],  and  [24,  25],  where  the  diagonal 
matrix  D  is  chosen  so  that  all  columns  of  AD  have  identical  length,  the  condition  number  of  AD 
comes  from  the  largest  angle  am«  of  A,  as 


1 

COS  Otmax 


<||AD||||(AD)*||< 


COS  Olmax 


according  to  Theorem  3.  This  bound  already  appeared  in  a  different  form  in  [21]. 


Van  der  Sluis  has  shown  that  a  column  equilibrated  matrix  A  has  the  lowest  condition  number 
among  all  matrices  of  the  form  AD  [24].  This  would  suggest  that  one  solve  only  linear  systems  and 


11 


least  squares  problems  with  column  equilibrated  matrices  so  as  to  minimise  the  condition  number 
in 


<  y/mK^A) 


Ml 

PIIIMI 


et. 


However,  the  condition  number  occurs  in  an  upper  bound ! 


An  examination  of  the  first  expression  for  the  component- wise  relative  error  in  Theorem  2  shows 
that  none  of  the  angles  change  when  the  columns  of  A  are  multiplied  by  non-zero  scalars.  In 
particular,  if  we  consider  instead  the  system  (AD)r  =  b,  where  z  =  D~xx,  then  the  computed 
solution  z  satisfies  a  perturbed  system  ADz  =  b  +  g.  Postmultiplication  of  A  by  D  corresponds  to 
premultiplication  of  A*  by  D~x,  which  changes  only  the  lengths  of  the  rows  rj  in  A1  but  preserves 
the  angles  /?,  between  b  and  r,-.  Hence  the  amplification  factor  1  /  cos  Pi  remains  invariant  under 
column  scaling. 

Therefore,  if  perturbations  are  restricted  to  the  right-hand  side,  then  column  scaling  is  only 
beneficial  if  it  manages  to  decrease  the  relative  perturbations  cos  t/>;  in  the  component-wise  relative 
error  (this  could  occur,  for  instance,  if  column  scaling  brings  about  a  different  choice  of  pivots  in 
Gaussian  elimination). 


2.6  Summary 


The  main  result  of  Section  2  is  the  pair  of  expressions  for  the  component-wise  relative  errors  in  a 
full-rank  least  squares  problem  when  perturbations  are  restricted  to  the  right-hand  side  (Theorem  2). 


Suppose  x  ^  0  solves  the  least  squares  problem  min,  ||Ay  —  6||,  and  x  solves  the  corresponding 
problem  miny  || Ay  —  (b  +  /)||  with  a  perturbed  right-hand  side.  The  relative  error  in  an  individual 
component  of  x  can  be  expressed  as 


-  Xi 


Xi 


1 

COS  Pi 


e b  cos  fa, 


where  Pi  is  the  angle  between  b  and  the  ith  row  of  Ax ,  ipi  is  angle  between  /  and  the  ith  row  of  A * ,  and 
=  ||/||/||6||.  Thus,  the  component-wise  relative  error  consists  of  a  relative  perturbation  e >,  cos  4>i , 
amplified  by  1/cos  This  amplification  factor  is  large  if  6  is  almost  orthogonal  to  the  ith  row 
of  A* ;  that  is,  if  6  lies  almost  in  the  space  spanned  by  the  other  columns  or  in  Ker(Ar )  =  Ker(A*). 


Because  pi  depends  only  on  the  direction  but  not  the  length  of  the  ith  row  of  A * ,  column  scaling 
of  A  is  only  beneficial  if  it  manages  to  decrease  the  relative  perturbations  ejcosti’.. 


We  also  gave  a  second  expression  for  the  relative  error 


Xi  -  Xi 


Xi 


Ml 

MINI 


Pfl  IMI  OS&, 


which  provides  a  clear  separation  of  the  factors  responsible  for  the  loss  of  accuracy  in  the  computed 
solution:  relative  magnitude  of  the  solution  components  ||i||/a:,-;  matrix  condition  ||.<4||  ||r,||;  and 
relationship  between  matrix  and  right-hand  side  nJ||-|^n  >  7^.  where  k(A)  =  PUP*!!  is  the 
matrix  condition  number.  In  case  of  a  linear  system  Ax  =  b, 


_|j6||__  JAxjL 
piiwrptiMi-  ’ 

otherwise  there  is  no  bound  as  b  may  be  almost  orthogonal  to  all  rows  of  A*. 
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The  component-wise  relative  error  tends  to  be  large  for  those  components  x,  whose  size  is  small 
in  comparison  to  ||ar||,  or  whose  matrix  condition  number  ||i4||  ||r,||  is  large,  or  whose  right-hand  side 
is  nearly  orthogonal  to  all  rows  of  A*.  Moreover,  Theorem  3  shows  that  there  must  be  at  least  one 
component  **  for  which  ||A||  ||rt||  >  K(A)/y/m.  In  the  special  case  when  Ax  =  b  and  6  is  a  singular 
vector  associated  with  the  smallest  singular  value  of  A, 


Mil  _  1 

Pll  11*11  «(*)’ 


and 


11*11 


Pll  11*11 


<  i. 


and  the  sensitvity  of  all  solution  components  to  right-hand  side  perturbations  is  solely  determined 
by  their  relative  magnitude. 


In  the  next  section  we  derive  expressions  for  component-wise  relative  errors  when  perturbations 
in  the  matrix  are  also  allowed.  For  simplicity  we  start  with  linear  systems,  and  consider  least  squares 
problems  separately  in  the  subsequent  section. 


3  Perturbation  Results  for  Linear  Systems 


We  derive  expressions  for  component- wise  errors  in  a  linear  system  of  full  column  rank  when  both 
matrix  and  right-hand  side  are  perturbed.  From  these  expressions  we  derive  component-wise  condi¬ 
tion  numbers  for  the  individual  components  of  the  solution.  The  expressions  for  the  component-wise 
errors  are  used  in  turn  to  derive  upper  bounds  for  the  norm-based  errors  that  are  essentially  equal  to 
the  conventional  upper  bounds.  We  conclude  that  the  norm-based  bounds  are  as  tight  as  possible. 
If  they  turn  out  to  be  pessimistic  then  this  is  because  one  has  chosen  to  measure  the  norm  of  the 
error  instead  of  the  error  in  individual  components. 


3.1  Component- Wise  Errors 


A  computed  solution  x  to  a  linear  system  Ax  =  6  can  be  viewed  as  the  exact  solution  to  a  perturbed 
system  {A  +  F)x  =  b  +  /.  We  will  determine  how  the  error  in  the  components  of  x  is  affected  by 
the  perturbations  F  and  /. 

The  first  theorem  investigates  the  effect  of  perturbations  in  the  matrix. 


Theorem  6  Given  a  matrix  A  of  full  column  rank  and  6^0  such  that  Ax  =  6,  let  the  computed 
solution  x  ^  0  satisfy  (A  +  F)x  =  6. 

Denote  by  the  angle  between  r,  and  Fx.  Then 

-  11^*11  cos  Ipj 

||<*«||  cosarj  ‘ 

If  H  ^  0  and  eA  =  then 

Xj  -  Xj 
Xi 


1  ||F*II 

cos  ft  ||fc|| 


COS  4>i 


=  ||A||||r,||  eA  cos 

Xi 
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Proof:  In  Theorem  2  we  set  f  —  —Fx  to  get 


x,-  =  n  —  rf  Fx 


1  11^11 

COS  Pi  ||6|| 

||  Fi||  cos  V’i 


|(aj||cosa<  ' 


Dividing  the  whole  equation  by  Zj  gives  the  expressions  for  the  component-wise  relative  error. 


The  first  expression  for  the  component- wise  error  says  that  the  more  b  lies  in  spa n^fa*},  the 
more  sensitive  is  z,-  to  relative  perturbations.  However  a  large  1/ cos/?;  does  not  necessarily  imply 
that  b  has  little  contribution  in  a*.  In  fact,  if  6  =  a,  and  1/  cos  a,-  is  large  then  1/  cos  A  will  also  be 
large  -  in  this  case  cos /?,  reflects  the  linear  dependence  of  the  columns  of  A. 

We  interpret  the  second  expression  for  the  component- wise  relative  error  in  Theorem  6  as  follows: 
the  first  term,  ||x||/x,-,  represents  the  relative  magnitude  of  x<;  the  second  term,  ||A||  ||rj||  =  —  > 

represents  the  linear  dependence  of  the  »th  column  of  A  on  all  other  columns;  and  the  iast  term 
e/i  cos  ipi  represents  a  relative  perturbation  for  the  matrix  in  the  context  of  the  given  linear  system. 
The  component-wise  relative  error  tends  to  be  large  for  those  components  z*  whose  size  is  small  in 
comparison  to  ||z||,  or  whose  associated  column  is  short  in  length  or  nearly  linearly  dependent  on 
the  other  columns.  The  two  amplification  factors  in  the  second  expression  for  the  relative  error  in 
Theorem  6  provide  a  clear  separation  of  the  factors  responsible  for  loss  of  accuracy  in  the  computed 
solution:  relative  magnitude  of  solution  components  and  linear  dependence  of  matrix  columns. 

In  comparison  to  the  error  from  right-hand  side  perturbations  in  Theorem  2,  the  error  from  matrix 
perturbations  in  Theorem  6  does  not  contain  the  term  rffa.  which  describes  the  relationship 
between  matrix  and  right-hand  side.  According  to  Theorem  3  we  conclude  that  there  always  exists 
a  component  x*  whose  sensitivity  to  relative  perturbations  in  the  matrix  is  on  the  order  of  k(A). 
This  is  in  contrast  to  right-hand  side  perturbations,  where  b  has  to  lie  in  a  certain  direction  for  the 
sensitivity  to  be  proportional  to  the  condition  number. 

Before  resolving  this  apparent  contradiction  (in  particular,  when  the  perturbations  are  due  to 
backward  errors  from  algorithms,  which  can  be  shuffled  back  and  forth  between  matrix  and  right- 
hand  side),  we  first  give  an  expression  for  the  component-wise  relative  error  for  a  linear  system  when 
matrix  and  right-hand  side  are  perturbed  simultaneously. 


Corollary  1  Given  a  matrix  A  of  full  column  rank,  and  6^0  such  that  Ax  =  b.  Let  x  ^  0  satisfy 
(A  +  F)x  =  b  +  f. 

Denote  by  ipF,i  Ihe  angle  between  r,-  and  Fx,  and  by  rbf.i  the  angle  between  r*  and  f.  Then 

-  ,  ll/H  COS  Ipf'i  -  ||Fx||  cos  t/>Fr< 

Xi  =  Xi  -I - u — Tj - • 

||<J»||  cos  a,- 


If  Xi  jb  0  and 


then 


x, 


X, 


Xi 


IjFxH 

4  IMIIPII 


1 

||6||  cos  0i 


[IIFxllcos^f,,-  -  H/ll  cos  V/,i] 
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Xi 


Ua  COS  V»F,i 


11*11 

Pll  PI 


ft  COS  V’/,i 


The  second  expression  for  the  relative  error  allows  us  to  state  that,  in  general,  for  every  linear 
system  there  exists  a  solution  component  whose  sensitivity  is  proportional  to  the  condition  number, 
because  the  term  that  could  avoid  this,  uJ||*flg|| »  multiplies  only  the  right-hand  side  perturbations 
but  not  the  matrix  perturbations.  In  addition,  the  following  theorem  shows  that  for  any  x  ^  0  the 
perturbations  can  always  be  allocated  to  the  matrix. 

The  following  theorem  helps  to  resolve  the  discrepancies  in  the  roles  of  right-hand  side  and  matrix 
perturbations.  It  also  justifies  the  representation  of  the  matrix  perturbation  in  the  form  (A  =  |||p||| . 
The  bounds  on  the  norm-based  relative  error  [12,  23],  usually  contain  the  term  pA  —  IPII/PIl  as 
the  representative  for  the  matrix  perturbation.  But  tA  <  pA  and,  as  it  turns  out,  eA  constitutes  the 
smallest  possible  matrix  perturbation. 


Theorem  7  Given  a  matrix  A  of  full  column  rank  and  b  ^  0  such  that  Ax  =  b,  and  a  computed 
solution  x  ^  0,  let  Fmi„  be  the  perturbation  of  smallest  Frobenius  norm  among  all  perturbations  F 
that  satisfy  ( A  +  F)x  =  b  fFmin  also  has  smallest  two-norm  among  all  such  perturbations). 


Ifn  ±  0  and  emin  =  ||Fm,„||/||J4||  then 


Xj  -  Xj 
Xi 


1 

COS  Pi 


fminCOS  Tpi , 


where  0,  is  the  angle  between  Fmj„x  and  r,-. 


If  (ret  —  ||*  —  i4x||/||i>||  is  the  relative  residual  then 


Cmin  — 


11*11  . 

PI  I  Pll  r"’ 


1  Nl, 

||*||Cre'  -  Cm<"  -  p|| 


Proof:  If  /  =  6  —  Ax  is  the  residual  then  Fmm  is  given  by,  [18]  and  Theorem  III. 2. 16  in  [23], 


r~T 


Fmin  — 


and  satisfies 


where  the  second  equality  comes  about  because  Fmin  has  rank  one.  Substituting  ||/||  =  ||FmIO||  ||x|| 
into  the  first  expression  for  the  component-wise  relative  error  from  Theorem  2  yields  the  expression 
for  the  error. 


The  relation  between  cm,„  and  er„  comes  about  as  ere,  =  ||/||/||6||  and  ||Fm,„||  =  ||/ll/ll*ll-  ■ 


The  proof  of  Theorem  7  makes  clear  that,  given  Ax  =  b  and  x,  the  smallest  matrix  perturbation 
satisfies 


{A  +  Fmin)x  =  6, 


Pm, n||  _  I|FX|| 

Pll  Pll  PI’ 


which  is  exactly  the  matrix  perturbation  in  Theorem  6. 
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Moreover,  for  a  given  computed  solution  x  one  can  define  two  perturbations:  the  minimal  matrix 
perturbation  cm,„;  and  the  relative  residual  eret,  which  reflects  the  relationship  between  matrix  and 
right-hand  side.  If  the  magnitude  of  the  computed  solution  is  not  totally  off,  i.e.  if  ||x||  ss  ||x||,  then 
c min  is  of  the  same  order  of  magnitude  or  smaller  than  eTtt.  According  to  Sections  2.1  and  2.2,  eres 
is  much  larger  than  em,n  whenever  b  lies  nearly  in  the  direction  of  a  singular  vector  associated  with 
the  smallest  singular  value  of  A  (provided  the  directions  of  x  and  x  are  not  too  different). 

Regarding  the  interpretation  of  error  bounds  and  the  determination  of  amplification  factors,  one 
must  therefore  be  careful  about  deciding  whether  to  allocate  the  perturbations  to  the  matrix  or 
to  the  right-hand  side.  We  continue  this  discussion  in  the  context  of  norm-wise  error  bounds  in 
Section  3.4. 


In  Theorem  7  the  relative  errors  in  the  different  components  differ  only  in  cos^j/cos/?i,  while 
the  term  )|A||  ||x||/||6||  is  common  to  all  components.  Because  l/cos/?j  >  1, 


Pll  11*11 
Ml 


Cmin  1  COS  |, 


so  all  components  of  x  are  sensitive  to  matrix  perturbations  if  ||A||  ||x||/||6||  is  large.  In  particular, 
if  b  lies  along  the  direction  of  a  singular  vector  associated  with  the  smallest  singular  value  of  A 
then  ||A||  ||i||/||6||  «  ic(A).  Together  with  the  results  from  Section  2.3  this  implies  that  the  solution 
components  are  extremely  sensitive  to  matrix  perturbations  exactly  when  they  are  insensitive  to 
right-hand  side  perturbations. 


The  expressions  for  the  component-wise  errors  in  this  section  contain  not  only  the  data  A  and  6, 
but  also  the  result  x.  In  Appendix  2  we  show  how  to  express  the  relative  errors  entirely  in  terms  of 
the  data;  although  the  perturbations  take  a  slightly  different  form,  the  magnification  factors  for  the 
perturbations  continue  to  be  1/  cos  a,  and  I  /  cos  /?,- . 


3.2  Examples 


Now  we  give  two  examples  to  illustrate  the  previous  results.  The  first  example  demonstrates  that  a 
matrix  with  perfectly  conditioned  columns  may  give  rise  to  a  linear  system  with  extremely  sensitive 
solution  components. 


Example  1  If  A  is  an  orthogonal  matrix  then  ||a  |}clra  =  1  an^’  according  to  Theorem  6,  ||x||/x,- 
is  the  sole  term  responsible  for  error  magnification.  Thus,  as  we  already  know  from  the  noim-wise 
bounds,  a  solution  vector  with  small  as  well  as  large  components  suffers  from  large  error  amplification 
in  the  small  components. 

This  also  comes  out  if  we  consider  instead  the  angles 

1  =  1  =l  1  _  114 

cos  a,-  ||a,||  cosar,-  ’  cos/?,-  x;  ’ 

where  the  next  to  last  equality  comes  about  because  ji6jj  =  ||x||. 

In  contrast  to  the  first  example,  the  second  one  shows  that  even  a  very  ill-conditioned  matrix 
may  have  robust  solution  components.  It  is  a  generalisation  of  the  example  presented  in  Section  1.1. 


1  _  J_ 

||6||  cos/?,  x,-’ 
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Example  2  Consider  a  4x  4  orthogonal  matrix  A  =  (ai  a2  a3  a4 )  and  define  a  one-parameter 
family  of  matrices  by 

>l(A)  =  (ai  a2  a3  ^-L-,(Aa3  +  a4))  . 

We  see  that  ,4(0)  =  A,  a  well-conditioned  matrix,  and  that  ,4(oo)  is  a  singular  matrix.  For  all  X, 
||.4(A)||  <  2.  When  A  <  00,  the  inverse  is  given  by 


V  \/l  +  A2a4  / 


from  which  we  can  compute 


cos<*i  =  jjai||  cos  ai  =  cos  a2  =  ||a2||  cos  a2  =  1 
cosa3  =  ||a3||  cos(a3)  =  cosa4  =  ||a4||  cos(a4)  =  . 

Thus  as  A  — *  00  the  matrix  A(X )  becomes  increasingly  singular.  Its  condition  number  behaves  like 
0(A).  Note  that  the  matrix  .4(A)  is  column  equlibrated,  so  the  ill-conditioning  is  a  result  of  small 
angles  rather  than  short  columns. 


Consider  a  linear  system  ,4(A)x(A)  =  b,  where  the  right-hand  side  is  independent  of  A  and  can 
be  represented  as  b  —  rxai  +  r2a2  +  r3a3  +  r4a4.  Then 


c“A  =  Pi'  cosA  =  H 


cos/?3  = 


TZ  ~  XT4 

IWlvT+U* 


cos0*  =  tSt- 


The  solution  vector  is  given  by 

x(A)  =  (r,  r2  r3  -  Ar4  Vl  +  A 2r4  )T  . 

The  values  of  x\  and  x2  are  independent  of  X,  and  so  are  ||a;||  cosa^-  and  cos  0j  for  j  =  1,2.  So  the 
sensitivity  of  the  components  xi  and  x2  depends  solely  on  their  size  relative  to  x.  If,  for  instance, 
|xi|  |x,|  for  i  ^  1  then  Corollary  1  says  that  the  error  in  xi  is  not  amplified  -  independent  of  the 
values  of  X  and  the  condition  number  of  A{ A). 


3.3  Condition  Numbers  and  Column  Scaling 

For  a  linear  system  Ax  =  6  with  full-rank  coefficient  matrix  A  and  non-zero  right-hand  side  b, 
Corollary  1  presents  two  different  expressions  for  the  component- wise  relative  error  in  the  computed 
solution  x:  suppose  x  ^  0  satisfies  (A  +  F)x  =  b  +  f,  and 

e  -M  ,  _  Ji£fiL 

IIC  A  ~  IMII  11*11 


||6||cos/?j 


[||Fxj|  cos  ipF,i  -  ll/H  cos  V*/,*] 


=  IMIIIWI  |«A  COS  lpF,i  -  Pij-pii«fc  COS  ibj.i 
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Sections  2.2  and  3.1  explain  that  under  certain  circumstances  the  factor  jpjjj J|gy  causes  the  sensi¬ 
tivity  of  large  solution  components  to  right-hand  side  perturbations  to  be  independent  of  matrix 
conditioning.  We  now  ignore  because  it  does  not  affect  the  sensitivity  of  the  solution  to 

perturbations  in  the  matrix. 


The  term  ||A||  ||r,||  =  ||A||  ||ef  A*||  <  ]|.Aj|||At||  represents  a  condition  number  ‘restricted  to’  x,-. 
Already  in  1970  van  der  Sluis  [25,  26]  realised  the  need  to  distinguish  the  conditioning  of  individual 
components  of  x  and  the  fact  that  the  conditioning  depends  on  the  relative  size  of  a  component.  He 
introduces  the  notion  of  ‘ith  column  condition  number  of  A’,  ||A_1||  ||aj||,  and  derives  the  similar 
looking  norm-wise  relative  error  bound  (here  /  =  0) 


He  also  acknowledges  the  importance  of  angles  on  the  conditioning  of  the  matrix:  if  each  column 
is  well  separated  from  the  space  spanned  by  the  other  columns  then  the  solution  components  are 
likely  to  be  insensitive  to  perturbations,  page  251  in  [26]. 


According  to  van  der  Sluis’s  bounds  one  naturally  concludes  that  column  equilibrated  matrices 
(all  of  whose  columns  have  identical  norm)  should  give  rise  to  solution  components  with  identical 
sensitivity  to  perturbations.  Yet,  the  amplification  factor  p]| c*— ^  in  the  first  expression  for  the 
component-wise  relative  error  is  independent  of  column  scaling.  So  essentially  the  conclusions  of 
Section  2.5  remain  valid  when,  in  addition  to  the  right-hand  side,  the  matrix  is  also  perturbed:  the 
component-wise  relative  error  decreases  under  column  scaling  only  if  column  scaling  manages  to 
reduce  the  perturbation  ||Fx||  cost/^j  —  ||/||cosV’/1».  Note  that  we  could  have  also  expressed  the 
error  as 


Xi  X a 


1 


cos  /?,• 


\\m 


ii*ii 


COS  —  €k  COS  0/,i 


in  which  case  the  amplification  factors  for  the  relative  perturbations  ||Fx||/||6||  and  tb  remain  invari¬ 
ant  under  column  scaling.  However,  when  /  =  0  we  know  from  Theorem  7  that 


||Fz||  _  Hb -Az||  Pl|||z|| 

11*11  "  11*11  "  11*11  ^ 


where  ||A||  ||£||/||*||  can  be  as  large  as  «(A).  This  means  that  the  perturbation  ||Fx||/||6||  may  be 
proportional  to  the  condition  number  of  the  matrix.  Finally,  Lemma  1  of  Appendix  2  states  that 
the  amplification  factors  for  the  error  (x,  -  x<)/x,-  remain  invariant  under  column  scaling  when 
perturbations  are  restricted  to  column  i  of  the  matrix. 


Although  the  amplification  factors  in  the  second  expression  for  the  error  above  do  change  under 
column  scaling,  they  have  the  advantage  of  representing  easily  computable  a  posteriori  error  esti¬ 
mates:  we  show  in  Section  6  how  to  estimate  ||A||  ||rj||  efficiently  with  available  condition  number 
estimators. 


Due  to  the  deliberations  in  this  and  the  previous  sections  we  feel  justified  in  introducing  a  new 
set  of  condition  numbers. 


Definition  1  Let  Ax  =  6  be  a  linear  system  with  n  x  m  matrix  A  of  rank  m  and  6^0,  and  let 
x  yt  Q  be  the  computed  solution.  Denote  by  rj  =  ej  A*  the  ith  row  of  the  left-inverse  of  A  and  by  pi 
the  angle  between  b  and  n,  1  <  i  <  m. 

The  quantities 

H.  Mil  INI.  I  <i  <  m, 

\xi  I 
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are  called  ‘component-wise  condition  numbers  for  the  linear  system  Ax  =  6’.  We  also  refer  to  them 
as  ‘condition  numbers  for  xt- 

The  validity  of  this  definition  is  also  justified  by  earlier  work  of  Stewart  [21],  page  72,  who 
introduces  the  ‘collinearity  indices’  k ,  =  ||a,j|  ||r,j|.  The  squares  of  the  collinearity  indices  are  known 
as  ‘variance  inflation  factors’  in  statistics  [21].  From  Theorem  5  we  see  that  icj  =  1/ cos  a*.  They 
represent  the  scaling-invariant  version  of  ||A||  |Jr,j|  and  appear  as  amplification  factors  in  the  error 
expressions  of  Appendix  2.  The  main  difference  between  our  and  Stewart’s  condition  numbers  is 
that  the  collinearity  indices  are  designed  to  reflect  the  linear  dependence  of  the  matrix  columns, 
while  our  component-wise  condition  numbers  measure  the  conditioning  of  the  linear  system:  matrix 
plus  right-hand  side. 


3.4  Conventional  Error  Bounds 

As  we  did  already  in  Section  2.3,  we  now  relate  our  component-wise  results  to  the  conventional 
norm-based  upper  bounds  on  relative  errors. 

Corollary  2  Given  a  n  x  m  matrix  A  of  rank  m  and  6^0  such  that  Ax  —  b,  as  well  as  x  £  0  with 
(A  +  F)x  =  b  +  f,  lei 

,  _  Mi  ,  _  ira 

*"n*ir  "  PiiPir 


1  PI  Pll  „P-*I 

)m.aX  llAtlllxll^  ||x||f^  -  ||xj| 


<  y/rnic(A)  ^ 


x||e4+  ||x||£4  ’ 


where  fifti  =  ||r,||  cos  rp}ii/ max*  ||r*||,  PF,i  =  ||r*'ll  cos  d>F,i/  max*  ||rt||,  and  and  rpF<i  are  the 
respective  angles  between  the  ith  row  r,  of  A*  with  f  and  Fx. 

Proof:  Multiplying  the  last  equation  in  Corollary  1  by  Xj  and  applying  the  infinity-norm  gives 
P  -  *||oo  =  Pll  max  j||A||  ||r,|]  |]jj|~|.[| d  cos -  eA  cos 
yielding  the  upper  bound 


<  max{||i4||  ||rj || } 


_  .Pll,  Pll  r  ,PII, 

*11 4  IN'  \  -  {' }  Limiiii^ii  ‘  Pll  T 


The  lower  bound  is  obtained  from  Theorem  3 


l|*-*||co^  1  ...  \M  Pll 

— r> — Ti —  -  -7=«(^)max  ■;.■■■  .,<>/«/.»  “  Tr-T^APF.i 
11*11  y/m  •  l|A||||x||  P|| 


The  mixed-norm  error  is  replaced  by  the  two-norm  error  by  means  of  the  inequalities  ||j/||oo  < 
Pll  <  x/mplloo  for  any  m-vector  y,  Section  2.2.2  in  [12].  ■ 

We  arrive  at  the  same  conclusion  as  in  Section  2.3,  where  oniy  perturbations  in  the  right-hand 
side  were  allowed.  Without  knowledge  about  cos  d>F:i  and  cos  the  norm-based  bounds  are  as 
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tight  as  possible.  In  fact,  under  weaker  conditions  we  have  derived  essentially  the  same  upper  bound 
as  the  one  commonly  found  in  the  literature  for  non-singular  linear  systems.  In  Section  III. 2. 3  in 
[23],  for  instance,  one  finds  that,  subject  to  the  condition  ||j4-1F||  <  1, 


< 


1  -  k(A)pa 


( Pa  +  fi), 


while  Corollary  2  gives 


<  \/™*('4)jjfjj  ( Pa  +  e»)  ■ 


In  this  last,  commonly  used  form  the  norm-based  bound  ignores  any  relationship  between  matrix 
and  right-hand  side. 


Chan  and  Foulser  [7]  intended  to  remedy  this  ignorance  of  the  right-hand  side  by  modifying  the 
bound  as  follows.  Let 


A  =  UHVT ,  where  U  =  (u  i  ...  u„  ) ,  <?i  >  <?2  >  ■  ■  ■  >  <t„  >  0, 

be  the  SVD  of  a  non-singular  matrix  A  with  singular  values  Oi  and  right  singular  vectors  .  Accord¬ 
ing  to  Theorem  1  in  [7],  if  Ax  =  6+/  and  Pi,  is  the  orthogonal  projection  onto  span  {u„_*+i, . . . ,  u„} 

ii*-*ii  (mbwy1 

11*11  "  *n  V  11*11  )  " ‘ 

Chan  and  Foulser  [7]  conclude  that  if,  for  some  k,  a  large  fraction  of  b  lies  in  span  {un_t+j, . . . ,  u„} 
and  if  <rn_*+1  as  an  then  x  ‘is  relatively  insensitive  to  perturbations  in  b\  For  instance,  if  6  =  u„ 
then  P\b  —  b, 


and  we  conclude  that  x  is  insensitive  to  perturbations  in  b. 


The  interpretation  of  Theorem  1  given  in  [7]  is  valid  if  /  represents  the  input  error  in  the  data  b. 
However  we  do  not  agree  with  the  application  of  Theorem  I  in  the  case  when  /  represents  a  backward 
error  chosen  to  satisfy  Ax  =  6+  /.  As  we  discussed  in  Section  3.1,  /  depends  on  the  size  of  x.  From 
Theorem  7  we  know  that  Fmin  —  ~gTi  *s  the  perturbation  of  smallest  two-norm  and  Frobenius 
norm  satisfying  (A  +  =  6,  and  that 


_1 _ 

COS  Pi  I 
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11*11 


(mini  COS  A\- 


When  6  =  un  the  common  term  ||A||  ||i)|/||6||  is  approximately  <T\/<Tn,  and  the  sensitivity  of  all 
solution  components  is  proportional  to  the  condition  number.  A  slightly  different  argument  based 
on  the  second  statement  in  Theorem  7, 


,  11  Fminll  .  11*11  . 

m,n~  j|A||  ~|M||||x||4’ 

implies  that  for  6  =  un  we  have  c*  as  »c(A)emi„,  and  the  ill-conditioning  is  merely  hidden  in  the 
perturbation  e».  Consequently,  all  components  of  x  are  extremely  sensitive  to  perturbations  if  A  is 
ill-conditioned,  which  disagrees  with  the  interpretation  by  Chan  and  Foulser. 
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3.5  Summary 


The  main  results  in  this  section  are  expressions  for  the  component-wise  errors  in  a  linear  system 
of  full  column  rank  when  perturbations  are  allowed  in  both  the  matrix  and  the  right-hand  side 
(Corollary  1). 


Suppose  A  is  a  matrix  of  full  column  rank  and  b  0  such  that  Ax  =  b.  Let  x  yb  0  satisfy 
(A  +  F)x  =  6  +  /.  If 

t  ii^ii 

4  n*ir  A  IMIINI 

then  the  relative  error  in  an  individual  component  of  x  can  be  expressed  in  two  ways, 


Xi  -  Xi 


Xi 


||*||  COS/?, 

M 

Xi 


[|jF z||  cos  V>F,i  -  ||/||  cos 


t  A  COS  ~ 


11*11 


—  (b  COS 


where  rj  is  the  ith  row  of  the  pseudo-inverse  A* ,  /?,■  is  the  angle  between  6  and  r,,  and  ipF,i  and  V'/,i 
are  error  angles. 

In  the  first  expression,  the  amplification  factors  ^  g  are  invariant  under  column  scaling. 
Hence  the  component-wise  relative  error  decreases  under  column  scaling  only  if  column  scaling 
manages  to  reduce  the  size  of  the  perturbations. 

In  the  second  expression,  the  perturbations  are  amplified  by  two  terms:  ||x||/z,  represents  the 
relative  magnitude  of  r<,  and  ||A||  1 1 1 1  represents  the  dependence  of  the  ith  column  of  A  on  all  other 
columns.  Hence  the  component-wise  relative  error  tends  to  be  large  for  those  components  i,  whose 
size  is  small  in  comparison  to  ||f||,  or  whose  associated  column  is  short  in  length  or  nearly  linearly 
dependent  on  the  other  columns. 

Theorem  7  demonstrates  that  any  x  ^  0  can  be  viewed  as  the  solution  to  a  linear  system  whose 
perturbations  affect  only  the  matrix  and  leave  the  right-hand  side  clean.  Hence  the  amplification 
factor  for  each  z,  is  at  least  ||v4||  ||r<||.  According  to  Theorem  3  there  always  exists  a  component  Xk 
for  whom  j|A||  ||rt||  is  on  the  order  of  x(A).  Thus  any  linear  system  contains  a  solution  component 
whose  sensitivity  to  relative  perturbations  is  proportional  to  the  condition  number  of  the  matrix. 

The  quantities 

M  \\A\\\\eT A'\\,  \  <i  <m, 

\Xi\ 

are  called  component-wise  condition  numbers  for  the  linear  system  Ax  =  6. 


4  Perturbation  Results  for  Least  Squares  Problems 


We  saw  in  Section  2.1  that,  provided  the  matrix  has  full  column  rank,  perturbations  in  the  right-hand 
side  have  the  same  effect  for  both  linear  systems  and  least  squares  problems.  However  this  is  not 
true  for  perturbations  in  the  matrix.  A  perturbation  F  in  a  linear  system  (A-)-  F)x  —  b  represents  a 
‘linear  disturbance’,  and  it  does  not  interact  with  the  right-hand  side.  In  contrast,  a  perturbation  F 
in  a  least  squares  problem  miny  ||(A  +  F)y  —  *||  represents  a  ‘quadratic  disturbance’,  because  the 
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perturbed  problem  is  equivalent  to  the  linear  system  (^4+ F)T(A  +  F)x  =  ( A+F)Tb .  In  addition,  the 
perturbation  F  causes  a  second  quadratic  disturbance  if  the  right-hand  side  is  perturbed  separately. 

As  in  Section  3  we  derive  expressions  for  component-wise  errors  in  a  least  squares  problem  when 
both  matrix  and  right-hand  side  are  perturbed.  There  exists  a  component  of  the  solution  vector 
whose  sensitvity  equals  at  least  the  product  of  condition  number  and  tan#,  where  6  is  the  angle 
between  the  right-hand  side  and  the  column  space  of  the  matrix;  the  sensitivity  can  be  as  high  as 
the  product  of  tan  9  and  the  square  of  the  condition  number.  Least  squares  problems  are  therefore 
always  more  sensitive  to  ill-conditioning  than  linear  systems. 

Finally,  from  the  expressions  for  the  component-wise  errors  we  derive  an  upper  bound  on  the 
norm-based  error  that  is  essentially  equal  to  the  first-order  term  in  the  conventional  bound. 

The  perturbation  results  in  this  section  can  be  applied  to  the  computation  of  the  left-inverse  by 
expressing  it  as  the  solution  X  =  A 1  to  the  least  squares  problem  miny  ||Ay  —  /||,  and  computing 
one  column  of  A  at  a  time.  Norm-based  perturbation  results  for  pseudo-inverses  can  be  found,  for 
instance,  in  Section  III  of  [23]. 


4.1  Component- Wise  Errors 


Now  we  consider  perturbations  in  the  coefficient  matrix  of  a  full-rank  least  squares  problem  miny  ||  Ay  —  6||. 
The  following  theorem  shows  that  the  component-wise  errors  in  a  least  squares  problem  consist  of 
the  errors  in  a  linear  system,  plus  a  further  term. 


Theorem  8  Given  a  matrix  A  of  full  column  rank,  let  x  gt  0  solve  miny  ||  Ay  —  6||  and  let  x  ^  0 
solve  min„  ||(A  +  F)y  -  (6  +  /)||. 


Let 

rJ  =  efA*,  qj  —  ef  (AT  A)~l ,  f  =  b  +  /  -  (A  +  F)x  0, 

then 

-  ||Fj||cos^F.i  -  11/11  cos V>/it  |[Frr|| cosw^ 

ll°i|l  cos  ai  ||a<||2cos2a, 

where  Me  angle  between  Fx  and  r,-,  is  the  angle  between  f  and  r*,  w*  is  the  angle  between 

r,  and  r,  and  i  is  the  angle  between  FTr  and  qi. 


If  Xi  /  0  and  cA>r  =  Men 

Xi  -  Xi 


r  x  IHI  1 

=  L  +  ttttt  - —  COSW,' 


COS  0i 


=  L  + 


Mil  11*11 


^  lk.il  Mll’^.r  COS Wf>jl 


where 
L  -  - 


||t||co»ft  HlFillcos^r.  -  ll/ll  cos  <7.ll  =  IMII  Ill-ill 


<'lcoste'Piuil<‘ 


is  the  component-wise  relative  error  in  a  linear  system  solution  from  Corollary  1. 
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Proof:  The  vector  x  solves  miriy  || Ay  —  6||  if  and  only  if 


(A  *)(:)-(!)■ 


where  r  =  b  —  Ax,  and  the  inverse  of  the  coefficient  matrix  equals 


(I  A\-1_//-AA*  (A')t  \ 

\At  Oj  ~  {  At  ~(AtA)~1)- 


Moreover  the  vector  x  solves 

We  can  now  apply  Theorem  2  to  the  above  system,  whose  right-hand  si  Jo  perturbation  equals 
-  (^Ffr/y  l{lT  =  eT(ATA)~1  then 

it  =  Xi-rT(Fx-  f)  -  rfr  =  Xi  -  rj (Fx  -  /)  +  qj FTf 
because  FTf  —  —ATf.  Expressing  the  inner  products  in  terms  of  cosines  and  norms  gives 


x,  =  x,  - 


[IFillcosi/ii^  -  H/ll  cos  tfr/.i  _  |[f  H  cos u;,- 
||a,||cosarj  ||a,||cosaj 

||Fj||  cos  -  H/ll  cos  ip/'j  T 


ad  cos  a, 


+  ||9.il  ||FTf  ||  cos  wfi<> 


where  4>r.i  is  the  angle  between  r,  and  Fx,  V»/,»  is  the  angle  between  r,  and  /,  u n  is  the  angle  between 
r,  and  f ,  and  wfij  is  the  angle  between  qi  and  FTr. 

The  second  expression  for  the  relative  error  follows  from 

ll?.lll|fTr||  _  Mil  ,,  „  „  Ilrll  l[FTf|| 

r,-  “  r,  119,1111/411  ||A||||i||  IM^IIHfir 


The  above  theorem  contains  as  special  cases  the  perturbation  results  derived  before.  If  x  happens 
to  solve  the  linear  system  ( A  +  F)x  —  b  +  /  then  f  =  0,  and,  as  the  first  expression  for  the  relative 
error  shows,  the  errors  reduce  to  those  in  a  linear  system  from  Corollary  1.  If  F  happens  to  be  zero 
then  ATf  =  0,  so  cosuq  =  0  and  the  errors  reduce  to  those  due  to  pure  right-hand  side  perturbations 
from  Theorem  2. 

Theorem  8  provides  two  expressions  for  the  component-wise  relative  error,  they  differ  in  the  form 
of  the  additional  term  due  to  the  least  squares  nature  of  the  problem.  We  will  now  examine  them 
in  turn. 

The  perturbation  in  the  first  expression  for  the  relative  error  in  Theorem  8  is  cosw,.  As  argued 
above,  coauq  is  zero  whenever  F  is  zero.  The  perturbation  is  amplified  by  1/ cos/?,-,  which  indicates 
how  linearly  dependent  b  is  on  the  space  spant^j{at}.  The  term  ||r||/||6||;  is  independent  of  *,  hence 
present  in  the  relative  error  of  each  solution  component.  If  9  is  the  angle  between  6  and  11(A)  then  we 
show  in  Section  4.3  thai  |^|  =  sin#  <  1,  where  r  =  b  —  Ax  is  the  exact  residual.  Thus,  if  ||f||  «  ||r|| 
then  ||f||/||6|j  controls  the  influence  on  the  relative  error  from  the  additional  term  due  to  the  least 
squares  problem.  This  term  has  a  greater  influence  on  the  error  when  the  distance  of  b  to  7 2(A)  is 
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large:  77(A)  =  77((Af)r)  by  Theorem  5,  so  if  6  is  almost  orthogonal  to  rj  =  ej  A *  then  cos  /?,  ss  0 
and  1/ cos  0i  is  very  large.  The  advantage  of  the  first  expression  for  the  component-wise  relative 
error  in  Theorem  8  is  the  invariance  of  its  amplification  factor  under  column  scaling.  However  it 
seems  to  be  difficult  to  get  a  handle  on  cos  w,-. 

That  is  the  reason  why  Theorem  8  contains  an  alternative  expression  for  the  component- wise 
relative  error.  Although  individual  factors  in  the  alternative  expression  change  under  column  scaling, 
we  find  them  easier  to  interpret.  The  relative  perturbation  e^,r  cosc*i?i;  in  the  least  squares  term  is 
amplified  by  three  factors.  The  first  factor  represents,  as  in  the  error  for  linear  system  solution,  the 
size  of  the  component  Xi  relative  to  ||x||.  The  second  factor  |[ijj||  ||A||2  has  the  bounds 

(INIIMII)2  <  MIMII2  =  ||eT(A7\4)-||||M||2  <  ||(AtA)-1||||A||2  =  k2(A), 

where  we  have  made  use  of  the  inequality  ||g,||  >  ||rj||2  from  Corollary  6  in  the  Appendix  and  the 
fact  that  ||ArA||  =  |jA||2.  From  Theorem  3  we  know  that  there  exists  a  row  r*  of  A 1  whose  norm 
approximates  A 1  to  a  factor  of  ym.  Hence  there  exists  at  least  one  component  Xk  for  which 

IMIII*I|S  >  — «2(A). 

m 


The  third  factor  multiplying  the  relative  perturbation,  [| |f[f ,  describes  the  relationship  between 
matrix  and  right-hand  side.  It  will  be  examined  by  considering  instead  the  exact  quantity  . 

A  few  paragraphs  ago  we  introduced  the  angle  #  between  b  and  77(A),  so 


H 

m 


=  sin  #, 


11**11  _  \\b~r\ 

11*11  11*11 


This  implies  the  equality 


INI  =  li**ll 
11*1111*11  11*11  INI 


Since  ||x||  =  \\A*Az\\  <  ||Al||||A*||  we  get  the  bounds 


cos  9. 


1 

«(*) 


tan#  < 


11*11  w 


tan  9. 


Combining  all  the  bounds  for  the  exact  quantities  shows  that  for  some  Xk 
^*(A)  tan  9  <  ||<?*||  ||A||2  <  «2(A)  tan  9. 

Consequently,  whenever  ||f||  as  ||r|j  and  j|x||  as  |jx||,  there  exists  a  so.jtion  component  whose  sen¬ 
sitivity  equals  at  least  the  product  of  condition  number  and  tan#,  and  it  may  be  as  high  as  the 
product  of  tan#  and  squared  condition  number. 

Given  a  computed  solution  x  to  a  linear  system,  Theorem  7  shows  how  to  construct  minimal-size 
perturbations  F,  and  gives  expressions  for  the  corresponding  component-wise  relative  errors.  In 
the  case  of  least  squares  problems,  unfortunately,  we  do  not  know  how  to  construct  minimal-size 
perturbations  for  a  given  computed  solution  x.  Therefore  the  analogue  of  Theorem  7  for  least  squares 
problems  below  is  not  nearly  as  strong.  When  the  exact  residual  r  =  6  —  Ax  =  0  the  expression 
below  equals  the  one  in  Theorem  7. 


Theorem  9  Given  a  matrix  A  of  full  column  rank,  let  x  f  0  solve  mins  ||Ay  —  6||.  Denote  the 
computed  solution  by  x  ^  0,  the  exact  residual  by  r  =  6  —  Ax,  and  the  ‘computable  residual’  by 
rr  =  b—  Ax. 
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If  X(  ^  0  then 


Xi  -  Xi 


X; 


1  %/lirell2  -  l|r||2 

COS  Pi  ||6|| 


COS  Ipi, 


where  tf>i  is  the  angle  between  re  —  r  and  the  ith  row  r,-  of  AK 


Proof:  According  to  Theorem  III.5.5  in  [23],  the  computed  solution  x  solves  the  least  squares  problem 
min,  ||(A  +  Fo)y  -  6||,  where 

(re  —  r)xT 

r  o  — - rsrr - ■ 

xl  x 

When  r  =  0  then  Fq  equals  Fm<„  from  Theorem  7. 

We  want  to  substitute  Fq  into  the  second  expression  for  the  component-wise  relative  error  in 
Theorem  8.  To  this  end,  note  that  by  construction  of  Fo,  Fqx  =  re  —  r.  Moreover,  Fq  r  =  0  because 

r  =  b  -  (A  +  F0)x  =  re  -  Fqx  =  r, 


and 

0  =  (A  +  Foff  =  ATr  +  F0Tf  =  F0Tr 
due  to  ATr  =  0  and  the  first  part  of  the  proof  of  Theorem  8. 


Theorem  8  gives  therefore 


X{  -  Xi 


Xi 


1  lkc-r]| 

cos  A  ||J>|| 


cos  ipi , 


where  V>i  is  the  angle  between  r*.  At  last,  ||re  —  r||2  =  ||rc||2  —  ||r||2  as  rTre  =  rT(A(x  —  x)  +  r)  =  ||rjj2. 

■ 


We  can  now  define  the  set  of  condition  numbers  for  least  squares  problems,  it  contains  the  set 
of  condition  numbers  for  linear  systems. 


Definition  2  Let  x  ^  0  solve  the  least  squares  problem  min,  || Ay  —  6||  with  n  x  m  matrix  A  of 
rank  m.  Let  x  /  0  be  the  computed  solution  with  residual  r  £  0.  If  qi  =  cj {AT A)~l  and  r J  =  ej A * 
then  the  quantities 

H  IMIIIk.ll,  ppjjj  IMII’IMI,  i  <.<”■, 

are  called  component- wise  condition  numbers  for  the  least  squares  problem  min,  \\Ay  —  6||.  We  also 
refer  to  them  as  condition  numbers  for  x,. 


4.2  Example  and  Conventional  Error  Bounds 

We  now  modify  Example  2  for  linear  systems  to  illustrate  that  a  least  squares  problems  with  a  very 
ill-conditioned  matrix  may  have  extremely  insensitive  solution  components. 


Example  3  Consider  a  4x4  orthogonal  matrix  A  =  ( ai  a?  03  04)  and  define  a  one-parameter 

family  of  rectangular  matrices  by 

A(A)=(<J2  03  ^i-y(Aa3  +  a4)) . 


25 


We  see  that  4(0)  is  a  perfectly  conditioned  matrix,  and  that  4(oo)  has  two  linearly  dependent 
columns.  When  A  <  oo,  the  left-inverse  is  given  by 


from  which  we  can  compute 


INI  =  i,  INI  =  INI  =  n/i  +  a2 


and 


[4(A)T4(A)]_1  = 


0 

l 

/l+A5 


0 

1  + A2 


0 


-AVT  +  F 


-AV1  +  A2 
1  +  A2 


with 


Ikill  =  i.  INI  =  INI  =  n/(i  +  a2)(i  +  2A2). 

Thus  as  A  — *  oo  the  matrix  AT(X)A(X)  becomes  increasingly  singular,  and  its  condition  number 
behaves  like  0( A2). 


Consider  a  least  squares  problem  min,,  ||4(A)x(A)  —  6||,  where  the  right-hand  side  is  independent 
of  A  and  can  6e  represented  as  b  =  TjOi  +  r2a 2  +  r3a 3  +  r4a4.  Solution  vector  and  residual  are  given 

h 

x(A)  =  (  t2  r3  —  Ar4  v/1  +  A2r4  )T  ,  r  =  riat . 

The  value  of  xy  is  independent  ofX,  and  so  are  ||rj||,  ||gj|j  and  cos/?i  =  t2/J|6||.  Hence  the  sensitivity 
of  xy  depends  solely  on  its  size  relative  to  x,  and  the  distance  ||r||/||6||  —  lril/!WI  0/6  to  the  column 
space  of  A.  If,  for  instance,  |xj|  »  |x<|  and  |rv|  <  |r,j  fori  ^  1  then  Theorem  8  says  that  the  error 
in  Xy  is  not  amplified  -  independent  of  the  values  of  X  and  the  condition  number  of  A(X)T  A(X). 


At  last  we  derive  a  norm-wise  upper  bound  from  the  second  expression  for  the  component-wise 
error  in  Theorem  8,  which  turns  out  to  be  almost  identical  to  the  well-known  first-order  bound. 


Corollary  3  Given  a  matrix  A  of  full  column  rank,  let  x  ^  0  solve  miny  ||4y  —  6|j  and  let  x  jk  0 
solve  minB  ||(4  +  F)y  -  (b  +  /)||.  Suppose  r  =  b  -  Ax  ^  0  and  r  =  b  +  f- (A  +  F)x  ^  0. 


U max  {  T  <  e  then 


where 


< £ la) (~m  . . m\+ mK2(A) 

-  ( ^  VIMil  Ml  IWlPlMi"  {A) 


~rn tan  e  ^  ii  jlfrr  it  ^ tan  6 

«{A)  Mil  Ml 


MUM 


and  9  is  the  angle  between  b  and  11(A). 


Proof:  Multiplying  the  second  expression  for  the  relative  error  in  Theorem  8  by  x<  gives 


*-x||oo  =  ||x||  max 


ax  j  Mil  IN! 


eA  cos  Vr.i  - 


II*!! 


Mil  llx|| 


Cfc  cos 


\\A\\\\x 


ITT  \\<li\\  l|A||2e^ 


,r  COS  | , 
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where 


Ml  ,  __M!L  ,  _J£2L 

11*11’  ""IMIINI’  *'r  IMTlllk1l 

are  all  bounded  above  by  e.  The  proof  follows  from  these  inequalities  and  dividing  by  ||z||.  The 
upper  and  lower  bounds  on  ||^|p|x||  were  derived  in  Section  4.1.  ■ 


The  upper  bound  on  the  norm-wise  error  from  Corollary  3  resembles  very  much  the  first-order 
bound  on  page  229  in  [12],  where,  subject  to  the  condition  that  e  <  <c(j4), 


*  -  * 


££[“(^KRrib+I)+'‘!('4)l"" 


+  0(c>). 


If  one  refrains  from  replacing  |j^|r||r||  by  its  upper  bound  tan  0  then  the  norm-based  conventional 
bounds  are  essentially  tight.  Unfortunately  the  justification  is  not  as  strong  as  that  for  linear  systems 
because  we  do  not  have  clean  lower  bounds  on  the  norm-based  error.  In  related  work,  Van  der  Sluis 
[27]  has  shown  that  for  any  least  squares  problem  there  exist  perturbations  that  render  the  upper 
bound  on  the  norm-based  error  proportional  to  the  square  of  the  condition  number. 


4.3  The  Residual 


In  this  section  we  briefly  examine  the  error  in  individual  components  of  the  residual  r  =  6  —  Ax.  To 
this  end  denote  by  P  =  AA(  =  A(AT A)~x AT  the  orthogonal  projector  on  the  column  space  H{A) 
of  A,  and  by  Px  =  I  —  P  the  orthogonal  projector  on  the  orthogonal  complement  of  71(A).  Hence, 
if  r  =  b  —  Ax  is  the  residual  of  the  least  squares  problem  miny  \\Ay  —  6[|  with  solution  x,  we  can 
write  r  =  PLb. 

Remember  that  the  residual  of  the  computed  solution  x  is  represented  by  f  =  (A  +  F)x  —  ( b  +  f ) 
and  that  it  is  different  from  the  ‘computable  residual’  6  —  Ax. 


Theorem  10  Given  a  matrix  A  of  full  column  rank,  let  x  ^  0  solve  miny  \\Ay  —  6||  so  that  r  — 
b  —  Ax  0,  and  let  x  solve  miny  \\(A  +  F)y  —  (b  +  /)J|  so  that  r  =  b  +  f  —  (A  A  F)x  ±  0. 


then 


z%  =  *i  ~  llP.^II  (11^*11  c°s  -  H/ll  cos  «A/,,)  +  ||Pi||||r||cosj/i 

=  *i-  l|PiL|!  (11^*11  COS<^F,i  -  ll/ll  COS  —  llwi|lll^Tfllcos,Vi> 

where  <j>p ,•  is  the  angle  between  Fx  and  pf is  the  angle  between  f  and  ,  i>i  is  the  angle 
between  pi  and  f,  and  vrj  is  the  angle  between  FTf  and  u;,. 

_  jm_  f  ...  Mrfii 

~\\A\\\\x\y  A'r  lunillrir 


Lei 


<b  = 


Ml 
Ml  ’ 
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If  H  ^  0  and  y ,•  is  the  angle  between  pf-  and  b  then 


-  -||Fx||  cos  <t>F,i  +  ll/ll  COS  +  y==j==  ||r||  cos 


IHI  „  MUNI ,,  , 

—  ~j~  Up*  !!  ij^jj  (cxcosvf.i  — 


Mil  Pll 


—  th  cos  -  llro.ll  Mil  (A, r  COS  VWli  . 


Proof:  The  proof  proceeds  in  a  manner  similar  to  that  of  Theorem  8.  From  the  proof  of  Theorem  8 

(^)C)-(i).  UWXD-C;'). 

which  implies 

<)(:)-(S)-(V)  -  CH:H-r  -^S-O(W) 

So 

r  =  r-PJ-(F*-/)-(At)TFTf. 

Note  also  that  ||p-L||2  =  1  —  ||p,j|2,  due  to  the  symmetry  and  idempotence  of  the  orthogonal 
projectors  P  and  PL  =  I  —  P,  Section  2.6.1  in  [12].  ■ 

We  start  by  examining  the  first  expression  for  the  relative  error  in  Theorem  10.  The  first 
amplification  factor  contains  the  angle  yi  between  b  and  7 ZX(A).  If  b  is  close  to  1Z(A)  then  all  7,  are 
large  and  cos 7,  are  small.  Hence,  a  large  1  /  cos  7*  signals  a  small  component  Zi  of  the  residual  r. 

As  already  mentioned  in  Section  4.1,  the  factor  ||r||/||*||  approximates  ||r||/||6||  =  sin  6,  where  6 
is  the  angle  between  b  and  11(A).  For  the  sake  of  completeness  we  briefly  derive  this  well-known 
relation.  On  one  hand,  the  properties  of  orthogonal  projectors  imply  that 

||r||2  =  ||6  -  Ax||2  =  ||(7  -  P)*||2  =  ||6||2  -  ||P6||2, 


and  thus 

M  L/t  nirtiiy 

11*11  ~  V  v  n*n  / ' 

On  the  other  hand,  the  definition  of  an  angle  between  two  subspaces,  Section  12.4,3  in  [12],  implies 
that  the  angle  0  between  6  and  H(A)  satisfies 

.  bT  Pz  (Pb)TPz 

cos  9  —  max  ^  ■■  max  ... .. ..  _  >.  • 

**>  ||*||  ||Px||  **0  ||*||||Pr|| 

Substituing  b  for  z  and  using  the  Cauchy-Schwartz  inequality  leads  to  the  bounds 

M*ll  .  I|P*II2  <  mTr  (Pb)TPz  ||P6||||Pr||  _  HP*]] 

11*11  "  ||*||  ||P*||-?*o  ||*||  ||Pz||  -  ||*||  ||Pr||  ||*||  ’ 


and  cos 9  =  ||P*||/||*||.  Finally, 


=  v/l  —  cos2  9  =  sin  I 


Therefore,  ||f||/||*||  can  be  expected  to  be  large  when  *  is  far  away  from  the  column  space  of  A. 
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As  for  the  third  amplification  factor,  an  analogous  derivation  shows  that  ||p,||  =  cos  ry ,  where  tj 
is  the  angle  between  e,-  and  7£(A),  so 


IIPsll  _  COS  Ti  _ 

— ,  =  =  — -  =  cot  TV . 

1/1  -  INI2  sinr< 

Hence,  when  e<  is  close  to  Tl{A)  then  the  angle  r,  is  small,  and  cotr,-  is  large.  Like  the  first 
amplification  factor  this  one  also  signals  a  small  z,-. 


Now  we  consider  the  second  expression  for  the  relative  error  in  Theorem  10.  The  first  amplifi¬ 
cation  factor  ||f||/z,-  represents  the  magnitude  of  z,-  with  regard  to  the  whole  vector  r,  which  means 
that  small  components  of  r  tend  to  be  more  sensitive  to  perturbations  than  large  components. 


Furthermore,  we  know  that  there  always  exists  a  row  wj  of  A*  for  which  ||tt>t||  >  ||  and 

|| wjt||  ||A||  >  Hence  there  always  exists  a  component  of  the  residual  whose  sensitivity  to 

perturbations  is  proportional  to  the  condition  number  of  A.  This  also  shows  up  in  the  remaining 
amplification  factor  for  particular  right-hand  side  vectors.  There  always  exists  a  column  pj~  for 
which 


IIpIII  >  ^HPAII  = 


lEils 
v'S  11*11  ~ 


1  ||P16||  1  ||r|| 

x/H  11*11  xA  11*11 ' 


Therefore, 


iini„MifNl  >  J_M  \m\*u  _  J_MJM 

IIPj'  11  ||r||  ||*||  ||r||  "  ^  11*11  ’ 


As  demonstrated  in  Section  3.1,  if  6  is  close  to  a  singular  direction  of  A  associated  with  a  small 
vector  then  the  factor  ||A||  ||x|j/||6||  is  close  to  the  condition  number  /c(A). 


We  conclude  that  at  least  one  component  of  the  residual  in  a  full-rank  least  squares  has  a 
sensitivity  proportional  to  the  condition  number  of  the  matrix.  Note  that  it  is  now  the  columns 
of  that  determine  the  sensitivity  of  the  error  rather  than  the  rows. 


From  the  expressions  in  Theorem  10  one  can  derive  an  upper  bound  on  the  norm-wise  relative 
error  in  the  residual. 


Corollary  4  Given  a  matrix  A  of  full  column  rank,  let  x  £  0  solve  miny  ||  Ay  —  6||  such  that  r 
b  —  Ax  ^  0,  and  let  x  solve  miny  ||(A  +  F)y  —  (b  +  /)||  such  that  f  =  6  +  /  —  (A  +  F)x  ^  0. 


//max 


then 


<  e 


ll^X| 


11*11 


l(l  + 


Mll  11*11 


in)  + 1 


«MxEf!l  <  e  \k(a)  (m + m + ii 
(  ) n*n J  -  ( )(xiwi  imi;  J‘ 


The  second  upper  bound  is  almost  identical  to  the  first-order  bound  (5.3.9)  in  [12] 


r  -  r 


<  f(2x(A)  +  1)  minfl,  m  —  n}  +  0(e2). 


The  term  min  {m  —  n,  1}  accounts  for  the  possibility  m  =  n  where,  since  A  has  full  rank,  r  =  0  and 
PL  =  0. 
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4.4  Summary 


In  this  section  we  have  shown  that  the  errors  in  individual  components  of  a  full-rank  least  squares 
problem  miny  \\Ay  —  6||,  when  perturbationa  are  allowed  in  the  matrix  and  right-hand  side,  consist  of 
the  linear  system  errors  plus  a  term  whose  influence  depends  on  the  relation  between  the  right-hand 
side  vector  and  the  column  space  of  the  matrix  (Theorem  8). 


Given  a  matrix  A  of  full  column  rank,  let  x  ^  0  solve  miny  ||Ay  —  6||  and  let  x  ^  0  solve 
miny  ||(A  +  F)y  —  (b  +  /)||.  Furthermore,  let  f  =  6  +  /  —  {A  +  F)x  ^  0  and  ?,•  be  the  ith  row 
of  (AtA)~1.  If  Xi  0  the  component-wise  relative  error  can  be  expressed  as 


Xi  -  Xi 


Xi 


1H) 

u\m\ 


INI  PIP  (A,r  COS 


where  L  is  the  component- wise  relative  error  for  linear  system  solution, 
represents  an  error  angle. 


Ilf’7>ll 

Rm  - 


and 


There  exists  at  least  one  component  xt  of  the  solution  vector  for  whom 
^/c(A)tanfl  <  lllfcll  iMIl2  <  K2(A)tan0, 

where  8  is  the  angle  between  b  and  TZ(A).  If  ||f||  as  ||r||  and  ||x||  ss  ||z||  then  these  bounds  are  also 
valid  for  the  term  ||9i||  H^H2. 


The  quantities 


l|A!l2||c7 (ATyl)-1||,  1  <  i  <  m, 


l*<l  .  IHI  ||*|| 

are  called  component- wise  condition  numbers  for  the  least  squares  problem  miny  ||/lj/  —  6||. 


The  expressions  for  the  component-wise  error  in  the  residual  demonstrate  that  there  exists  at 
least  one  component  of  the  residual  whose  sensitivity  is  proportional  to  the  condition  number  of  the 
matrix  (Theorem  10). 


As  in  the  case  of  linear  system  solution,  we  use  the  expressions  for  the  component-wise  errors 
to  derive  the  conventional  norm-based  error  bounds  for  the  solution  and  the  residual.  We  conclude 
that  the  conventional  bounds  are  essentially  as  tight  as  possible  but  our  justification  is  not  as  strong 
as  for  linear  systems. 


5  Underdeter  mined  Linear  Systems 

In  this  section  we  discuss  the  solution  of  linear  systems  Ax  =  6  when  A  is  a  n  x  m  matrix,  n  <  m, 
whose  rank  is  m.  Since  this  system  may  have  infinitely  many  solutions,  we  want  to  compute  the 
solution  of  minimal  norm. 


5.1  Minimal-Norm  Solution 

Any  solution  x  of  Ax  =  b  can  be  uniquely  represented  as  the  sum  of  its  constituents  in  the  nuilspace 
and  row  space  of  A,  x  =  xK  +  xR  with  xR  €  TZ(At)  and  xK  €  Ker(A).  If  A*  =  AT(AAT)~ 1  is  the 
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right-inverse  of  A  then  A*  A  is  the  orthogonal  projector  onto  the  row  space  of  A,  Section  5.5.4  in 
[12],  and  xR  =  A^Ax. 

Note  that  the  component  xR  in  the  rowspace  is  the  same  for  all  x.  For  suppose  that  were  not 
the  case,  then  there  would  exist  x  and  y  with 

Ax  =  b,  Ay  =  A,  x  =  xR  +  xK ,  y  =  yR  +  yK ,  y*  #  xR, 

and  xR,  y*  €  7£(AT).  These  conditions  imply  that 

0  -  A(x  -y)  =  A(xr  -  y*)  +  A(xK  -  yK)  =  A(xR  -  y*). 

Hence  xR  —  y^  is  in  both  H(AT)  and  Ker(A).  But  this  is  only  possible  if  xR  —  y*  =  0  due  to  the 
orthogonality  of  the  spaces  7£(AT)  and  Ker(A),  Section  2.6.2  in  [12].  So  xR  must  be  unique. 

Since  xR  is  already  in  the  row  space  of  A,  an  orthogonal  projection  onto  the  row  space  leaves 
xR  unchanged,  xR  =  A^  Ax  =  A^b,  and  x  =  xK  +  A*b.  According  to  the  orthogonality  of  Ker(A) 
and  fc(AT),  ||a:||2  =  ||xK||2  +  ||^fl||2.  which  is  minimised  for  xK  —  0.  Therefore  xR  —  A^b  is  the 
minimal-norm  solution  to  the  linear  system  Ax  =  6. 


5.2  ‘Duality’ 


In  the  previous  section  we  established  that  the  minimal-norm  solution  xR  to  a  linear  system  Ax  —  b 
of  full  row  rank  must  lie  in  the  rowspace  of  A,  so  there  is  a  vector  y  such  that  xR  =  —ATy.  Hence 
xR  satisfies 

xR  +  ATy  =  0,  Axn  =  b. 

In  other  words,  xR  is  part  of  the  solution  to  the  non-singular  linear  system 


At 

0 


Now  remember  that  the  solution  xc  to  the  least  squares  problem  minj,  ||Ay  —  6||  of  full  column 
rank  satisfies 

r+AxC=b,  ATr  =  0. 

In  other  words,  xc  is  part  of  the  solution  to  the  non-singular  linear  system 


UT  o)(^)-(o)- 


Therefore  the  solution  of  the  full  row-rank  linear  system  and  the  solution  of  the  full  column- 
rank  least  squares  problems  constitute  ‘dual’  problems:  the  norms  of  r  and  xR  are  minimised,  so  r 
corresponds  to  xR  while  xc  corresponds  to  y. 


Thus  a  sensitivity  analysis  of  the  component-wise  errors  for  the  underdetermined  system  yields 
results  similar  to  those  of  Chapter  3  for  the  least  squares  problem,  and  we  will  only  give  a  brief 
sketch  here.  The  exact  solution  xR  and  the  computed  solution  xR,  satisfy  the  linear  systems 


(!?)(?)-(!)•  ( 


I 

A  +  F 


(A  +  Ff 
0 
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Hence 


(I-A'A  A t  \(  FTy  \ 
y)  l  Pf)T  -(AAT)~l  )  {FxR  -  f ) 

and 

xR=xR-PLFry-A\FxR-f). 

Like  the  residual  r  in  the  least  squares  problem,  the  sensitivity  of  the  components  of  xR  depends  on 
the  rows  of  the  projector  Px  ;  and  like  the  solution  xc  in  the  least  squares  problem,  the  sensitivity 
of  xR  is  also  governed  by  the  rows  of  A^ . 

The  perturbation  results  can  be  applied  to  the  computation  of  the  right-inverse  by  expressing  it 
as  the  minimal-norm  solution  to  the  linear  system  AX  =  /,  where  I  is  the  n  x  n  identity  matrix. 


6  Computation  and  Estimation  of  Component-Wise  Con¬ 
dition  Numbers 


In  this  section  we  discuss  how  to  compute  and  to  estimate  the  component-wise  condition  numbers 
for  linear  systems  and  least  squares  problems,  defined,  respectively,  in  Sections  2.5  and  4.1. 

Let  x  ^  0  be  the  solution  to  the  least  squares  problem  miny  \\Ay  —  6||  with  n  x  m  matrix  A 
of  rank  m.  If  x  ^  0  is  the  computed  solution  with  residual  r  /  0  then  component-wise  condition 
numbers  are 

M.  MIIIMI.  PTOMII’W'  1SiSra' 

where 

rJ=tjA\  qf  =  eJ{ATA)~l. 

The  condition  numbers  for  a  linear  system  form  a  subset  of  those  for  a  least  squares  problem. 

Numerical  issues  in  the  computation  of  the  ||r,||,  due  to  the  potential  ill-conditioning  of  A, 
are  addressed  in  [20],  and  in  the  context  of  statistical  errors  in  [21].  Now  let  us  consider  the 
computational  requirements. 

The  matrix  two-norm  ||j4||  can  be  bounded  by  the  one-norm  ||,4||i  =  maxi<,<m  ||a,||  via,  Sec¬ 
tion  2.3.2  in  [12], 

“Lpili  <  Pll  <  Vmpili- 

vn 

The  computation  of  the  m  vector  norms  ||a>||  for  P|ji  requires  a  total  of  0(mn )  operations.  The 
relative  sizes  ||x||/|xj|  can  be  estimated  a  posteriori  from  the  computed  solution  x  in  O(m)  operations. 
The  term  can  estimated  a  posteriori  from  the  computed  residual  re  =  b  —  Ax  in  O(mn) 

operations.  This  leaves  the  computation  of  the  |jrj)l  and  ||<ji||. 

If  a  factorisation  of  A  is  available  then  upper  bounds  on  the  ||r,j|  can  be  determined  in  0(n 2) 
operations,  as  shown  in  Section  6,  and  an  estimate  of  the  ||9,j|  can  be  obtained  by  making  use  of  the 
inequality  ||7,j|  >  J | r,- 1 ] 2  from  Corollary  6  in  Appendix  1. 

In  the  following  sections  we  discuss  how  to  compute  the  ||r,||  from  the  QR  decomposition  of  A, 
how  to  compute  them  in  the  special  case  where  A  is  bi-  or  tridiagonal,  and  how  to  estimate  them 
from  a  decomposition  of  A. 
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To  sum  up,  if  A~l  or  a  factorisation  of  A  is  available  then  the  component-wise  condition  numbers 
can  be  computed  or  estimated  with  a  total  of  0(n2)  operations. 


6.1  Computation  of  Condition  Numbers  from  the  QR  Decomposition 

Let 

x  =  0(o) 

be  the  QR  decomposition  of  A,  where  Q  is  a  n  x  n  orthogonal  matrix,  and  Hisamxm  non-singular 
upper  triangular  matrix.  To  compute  ||rj|j  and  ||<jj||  it  is  sufficient  to  work  with  R  instead  of  A,  as 
we  will  now  show. 

We  have 

qj  =  ef  ( AtA)~ 1  =  ef  R~lR~T  =  vf  R~T , 

where  u,-  =  R~Te{.  This  means,  once  u,  has  been  computed,  <?,■  can  be  determined  from  t >,•  by  solving 
the  triangular  system  Rqi  =  t>,-  in  0(m2)  operations.  As  A *  =  ( ATA)~1AT  we  get 

r  f  =  qf  At  =  vf  R~TAT  =  (vf  0  )QT, 

so  ||r,-||  can  be  determined  directly  from  via  ||r,||  =  ||v,||.  Hence,  ||^||  and  |]r,-||  can  be  obtained 
from  the  ith  row  vf  of  R~l. 

In  order  to  accomplish  this  efficiently  we  first  consider  the  case  »  =  m.  The  upper  triangular 
structure  of  R  implies  that  vm  =  R~Tem  =  jem,  where  p  is  the  element  of  R  in  position  (m,m). 
So  ||rm||  can  be  determined  from  the  bottom  element  of  R  via  ||rm|]  =  l/|p|  -  without  inverting  R. 
Substituting  vm  =  jem  in  qm  yields  Rqm  =  ~em.  Therefore,  if  a  QR  decomposition  of  A  is  available, 
||rm||  is  available  right  away  and  the  computation  of  qm  requires  merely  the  solution  of  a  m  x  m 
triangular  system. 

This  process  can  be  carried  out  for  all  i,  and  is  described  in  [20]  for  the  computation  of  ||rj||.  After 
choosing  a  permutation  matrix  P  that  moves  column  i  of  A  to  the  last  position,  and  performing  a 
QR  factorisation  of  the  permuted  matrix  AP ,  proceed  as  for  i  =  m  in  order  to  obtain  ||r,||  and  ||g,-||. 
The  proof  of  Corollary  6  in  the  Appendix  establishes  the  correctness  of  this  procedure.  One  does  not 
have  to  perform  the  QR  factorisation  from  scratch  for  each  permutation  P.  Gragg  and  Stewart  [13] 
show  how  to  efficiently  ‘update’  the  QR  factorisation  from  one  permutation  to  the  next  in  0(m2) 
operations,  see  also  Section  12.6  in  [12]. 

The  next  section  discusses  the  efficient  computation  of  the  ||rj||  for  non-singular  bidiagonal  and 
tridiagonal  matrices. 


6.2  Computation  of  Condition  Numbers  for  Bi-  and  Tridiagonal  Matrices 

In  [14]  Higham  gives  algorithms  for  computing  ||A-,||oo  when  A  is  bi-  or  tridiagonal.  We  modify 
these  algorithms  to  compute  ||r<||,  where  rf  =  ef  A~l . 
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When 


A  = 


/an  a\2 

022  <»23 

033 


\ 


am—  l,m  I 


is  a  m  x  m  non-singular  bidiagonal  matrix,  the  elements  a,j  of  its  inverse  are  given  by  [14] 


f° 

ay  —  <  l /an 

l  Ot.i+lOrj+i  j/aii 


if  i  >  j 
if  *  =  j  ■ 
if  i  <  j 


These  expressions  and  ||r,||2  =  J2T=ia<i  *ea<*  to  f°M°w'nf5  modification  of  Algorithm  2.1  in  [14] 
for  the  computation  of  all  ||r,||2, 


m  —  1  >  i  >  1. 


This  algorithm  requires  a  total  of  5m  operations.  It  incurs  no  round-off  error  from  cancellation 
because  all  quantities  involved  are  non-negative. 


When  A  is  a  m  x  m  tridiagonal  matrix  we  modify  Algorithm  4.2  in  [14]  to  compute  the  two-norm 


of  the  columns  of  A~T .  Assume  that 

/an 

an 

\ 

a2l 

022 

023 

A  = 

032 

033 

am—  l,m 

\ 

Om,m— 1 

amm  / 

be  irreducible,  that  is,  a^f+i  and  Oj+i.,  are  non-zero.  Otherwise  one  can  either  introduce  small 
perturbations  to  make  all  a,,+i  and  a,+iit  non-zero,  or  one  can  treat  A  as  a  block  tridiagonal 
matrix  with  diagonal  blocks  that  are  irreducible  tridiagonal  [14], 


The  inverse  of  a  tridiagonal  irreducible  matrix  AT  can  be  represented  by  means  of  two  vectors  y 
and  z  as  [5,  14,  29] 


ViZjPj  if  »  <  j 
ZiVjPj  if  i  >  j  ’ 


where 


Pi  =  1, 


Pi+l 


=  Pi 


°M+1  ’ 


1  <  J  <  m  —  1 . 


The  products  pj  can  be  computed  recursively  in  2m  operations.  To  illustrate  the  computation  of  y 
and  z  we  write  out  in  full  the  representation  of  A~T , 


/  yizi 

Vl*2 

yiZ3 

yizj 

y2^2 

y*z3 

yi«3 

y2^3 

y3*3 

y\Zm  \ 
y2*m  j 

y3*m 


\ 


yiZm 


2/2  2/3  %m 


Vm  Zm  - 


Pm  ' 


This  implies  A_Tei  —y\  z  and  A~Tem  =  zmpmy.  Set  y\  —  1.  Since  AT  is  tridiagonal,  one  can  solve 
for  t/2 ,  •  •  •>  Vm  from  equations  l,...,m—  1  of  ATy  =  (zmPm)-lem;  and  use  equation  m  to  solve 
for  zm.  Then  use  AT z  =  e i  to  solve  for  . .  .,zj. 
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Since  column  i  of  A~T  equals 

rT  =  Pi(ytZi  U2zi  •••  Vi-izi  yi*i  y»z.+i  •  ••  y»zm), 

we  have  ||r<||2  =  p?(r2 13)=i  I /j  +  y?  IZJLi+i  ZJ)-  All  1|»**||2  can  now  be  computed  in  a  total  of  O(m) 
operations  by  accumulating  the  two  sets  of  partied  sums 

*i  =  y?,  s,=«.-i  +  y?,  2<«<m, 


and 


tm  —  zrn  ’  ~  ti+1  "h  zi  > 


and  then  combining  them  into 


nil3  =  Pi(*?si  +  yJu+i), 


m  —  1  >  i  >  1 , 


1  <  i  <  m. 


When  A  is  a  Hessenberg  matrix  one  can  probably  use  the  expressions  for  A"1 
1 1 r,  1 1 2  in  a  fashion  similar  to  that  for  tridiagonal  matrices. 


in  [5]  to  compute 


6.3  Estimation  of  Component-Wise  Condition  Numbers 


When  A  is  a  mxm  triangular  matrix  upper  bounds  for  the  ||r,||  can  be  computed  in  0(m2)  operations 
by  making  use  of  ideas  from  condition  number  estimators  for  triangular  matrices  [16],  as  we  will  now 
show.  An  estimate  of  the  ||gi||  can  be  obtained  from  the  inequality  ||<7;||  >  ||»*< ||2  from  Corollary  6  in 
Appendix  1. 

Since  A  is  triangular,  (A-1),,-  =  1/a,-,  and  l/|a„|  <  ||r,-||  <  ||r,-||i.  Instead  of  A,  we  will  work 
with  its  comparison  matrix  C(A)  =  (c,;)  of  A  [3],  which  is  defined  as 

_  f  |a,i|  if  i  =  j 
C‘-’-\-|a,;-|  inl¬ 
and  satisfies  the  component-wise  inequalities 

C(A)-l>0,  lA-'I^CCA)-1 

because  it  is  an  M-matrix  [28].  The  first  inequality  implies  that  the  ith  element  of  C(A)~Te  equals 
||C(A)_Te,||i,  where  e  is  the  vector  of  all  ones,  while  the  second  one  implies  ||r,||  <  ||ri||i  < 
j|C(A)_Ted|i.  Hence  all  ||C(A)-Te,||i  can  be  computed  with  a  total  of  0(m2)  operations  by  solving 
the  system  C(A)Ty  =  e. 

When  A  is  a  general,  non-singular  matrix,  the  jjr<||  may  be  estimated  by  applying  the  above 
estimator  to  the  LU  factors  of  A.  Let  A  =  LU,  where  L  is  a  lower  triangular  and  U  is  an  upper 
triangular  matrix.  If  C(L)  and  C{U)  are  the  respective  comparison  matrices  of  L  and  U  then 

\L-l\<C(L)-\  \U-l\<C(U)~l,  |A-I|<|f/-1||i‘1l<C(t/)-1C(Lr1. 

Thus,  ||r<||  <  1 1 r(- 1 1 1  <  j|C(£)-TC(f/)-Te,-||1 ,  where  ||C(L)-TC(f/)-7’e<||1  is  the  ith  element  of 
C(L)~TC(U)~Te  and  e  is  the  vector  of  all  ones. 

Therefore,  if  the  LU  decomposition  of  the  mxm  matrix  A  is  available,  an  upper  bound  on 
the  ||r,||  can  be  obtained  by  solving  the  two  triangular  systems  C(L)Ty  =  e  and  C(U)T :  =  y  in 
0(m2)  operations.  Of  course,  the  same  is  true  for  the  Cholesky  decomposition  A  =  LT  L  of  a 
symmetric  positive-definite  matrix  A. 
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7  Summary 


Traditonally,  the  error  in  the  computed  solution  *  of  a  system  of  linear  equations  Ax  =  6  has  been 
estimated  from  the  norm  of  the  relative  error  \\x  —  x||/||x||. 


Amomg  the  advantages  of  norm-based  errors  are  straight-forward  perturbation  analyses  as  well 
as  clear,  simple  error  bounds.  For  example, 


<  n{A,x)e 


represents  an  approximate  upper  bound  on  the  norm  of  the  error  in  x,  where  c  is  the  size  of  the  relative 
perturbation  in  the  data  A  and  6,  and  the  condition  number  k(A,x )  determines  the  sensitivity  of 
the  solution  x  to  perturbations  in  the  data. 


In  the  simplest  case,  when  k(A,x)  equals  the  two-norm  condition  number  of  the  matrix  A,  the 
error  bound  frequently  turns  out  to  be  rather  pessimistic.  A  more  realistic  condition  number,  such  as 
Skeel’s  [19],  is  obtained  by  exploiting  the  structure  in  perturbations  from  error  analyses  of  algorithms 
for  linear  system  solution,  e.g.  [2].  Still,  as  we  illustrated  in  Section  1.1,  the  condition  numbers  are 
prone  to  overestimating  the  error  in  individual  components  of  x. 

As  a  consequence,  we  decided  to  pursue  a  perturbation  analysis  for  individual  components  of  the 
solution  x  without  making  any  assumptions  about  the  perturbations.  The  resulting  expressions  for 
the  component- wise  relative  errors  |i,  —  xt\  /  \xi\  are  simple  and  easy  to  interpret. 

The  terms  that  multiply,  and  possibly  amplify,  the  perturbations  in  the  component-wise  errors  are 
called  component-wise  condition  numbers.  Besides  being  amenable  to  nice  geometric  interpretations, 
they  are  able  to  reveal  the  existence  of  solution  components  that  are  much  better  conditioned  than 
existing  condition  numbers  would  lead  us  to  believe.  Moreover,  barring  any  restrictions  on  the 
perturbations,  we  showed  that  there  is  always  one  component  of  x  whose  condition  number  is 
proportional  to  /C2(A). 


We  conclude  that  no  norm-based  relative  error  bound  can  ever  predict  the  presence  of  well- 
conditioned  components  in  x,  assuming  no  restrictions  on  the  perturbations.  Therefore,  our  component¬ 
wise  condition  numbers  are  essential. 


It  would  be  interesting  to  examine  how  restrictions  on  the  structure  or  distribution  of  perturba¬ 
tions  affect  the  component-wise  condition  numbers. 
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8  Appendix  1:  Expressions  for  Left-Inverses 


We  give  ‘block’  expressions  for  the  left-inverse  of  a  matrix  with  full  column  rank,  which  involve 
angles  associated  with  columns  of  the  matrix,  and  one  can  clearly  see  that  each  part  of  the  left- 
ir  verse  has  a  geometrical  interpretation.  We  used  these  expressions  in  Section  2.4  to  justify  our 
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choice  of  measures  of  sensitivity  in  component-wise  relative  errors  for  the  solution  of  linear  systems 
and  least  squares  problems. 

In  order  to  explain  the  emergence  of  angles,  we  start  off  with  some  geometric  interpretations. 
Given  a  n  x  m  matrix  A  =  (a i  ...  am),  n  >  m,  with  columns  a,  the  matrix  AT  A  contains 
information  about  the  angles  between  individual  columns  a*  and  aj : 

(ATA)n  =  ajdj  =  IMIfoHcosty, 

see  also  [1],  page  65. 

In  contrast,  the  inverse  matrix  (A^A)-1  provides  information  about  the  angles  associated  with 
an  individual  column  and  the  subspace  spanned  by  all  other  columns,  as  the  next  theorem  shows. 
Thus,  while  AT  A  contains  ‘local’  information  about  columns,  the  inverse  provides  a  more  ‘global’ 
view. 


Theorem  11  Let  A  =  (ai  Ai)  be  a  nxm  matrix,  n  >  m,  of  full  column  rank,  where  a!  represents 
the  first  column  of  A  and  Ai  the  remaining  columns.  Let  c  be  the  solution  of  the  least  squares 
approximation  of  ax  by  the  columns  of  Ay,  so 

||aii|  =  min||Aiy-<n||,  -aj  =  A^-ai, 

where  ai  is  the  residual.  Similarly,  let  <f  be  the  least  squares  approximation  of  the  columns  of  A x 
by  a‘» 

Pi II  =  min||aiyT  -  Ai||  -  Ax  =  axdT  -  Ax. 


Then 


Proof:  Verify  that  multiplication  of  the  above  expressions  by  AT  A  gives  the  identity. 


These  particular  expressions  for  the  inverse  are  based  on  the  derivation  of  a  formula  for  partial 
correlation  coefficients  in  [9].  If 

-(?  S) 

is  a  symmetric  positive-definite  matrix  then  its  inverse  can  be  written  as  [8] 

M-\_(  {X  -  YtW~'Y)'1  -X-lY(W  -yx~1yt)-1\ 

“  _  YTW-lY)~'  (W-YX-1Yt)-1  )■ 


Apply  this  formula  to 


ATA-(a*ai 
AA~\Ajai  AjAt)’ 


and  use  the  fact  [9]  that  the  inverse  of  the  (1, 1)  element  of  (ATA)-1  equals 

afai  -  afAi(Aj'Ai)~lA'[ai  =  af  (l  -  At(Af  At)-1  A[)  a!  =  d(au 
where  /  -  Ai(Af  Ai)-1  Af  is  the  orthogonal  projector  [12],  page  7G,  onto  HL(A\). 
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Thus  a i  is  the  projection  of  ai  onto  the  orthogonal  complement  HL{A\)  of  the  column  space 
of  A\  in  while  A\  is  the  projection  of  A\  onto  the  orthogonal  complement  T2‘L(ai)  of  a i  in  3?n, 

ai  =  (I-j4i(AfAi)“1A]’)ax,  Ax  =  (I  -  ai(af  a1)-1a|’)  A, . 


It  is  easy  to  show  that  by  applying  a  permutation  to  the  columns  of  A,  the  above  theorem  can 
be  made  to  distinguish  column  a,  of  A  instead  of  column  ai. 


Theorem  12  With  the  notation  of  Theorem  11,  the  left-inverse  (AT  A)~x  AT  of  A  can  be  written  as 


Af  =  (ATA)~lAT  = 


( 


(dfat)  'aj 

(AJa^-UT 


)■ 


Proof:  Multiply  the  expression  in  Theorem  II  by  AT .  ■ 

One  can  clearly  see  that  the  first  row  of  (AT A)~iAT  is  orthogonal  to  Tl{A\),  while  the  remaining 
rows  are  orthogonal  to  ax.  Now  we  can  expr  the  elements  of  the  left-inverse  of  A  in  terms  of 
angles  associated  with  columns  of  A. 


Corollary  5  The  iih  row  rj  of  the  left-inverse  (AT A)~lAT  of  A  has  the  same  direction  as  the 
residual  in  the  least  squares  approximation  of  a*  by  the  remaining  columns, 


rf  =  e‘  (A  A)~lA 


i  aT 


1  -T  _  1 _ 1  -T 

afai  '  ~  ||o,j|cosa,- ||ai||a' 


while  its  length  is  inversely  proportional  to  that  of  the  residual, 


||a,l|  =  |]a,  cos  a,-  = 


Because  the  ith  row  r,  of  the  inverse  is  a  multiple  of  the  residual  a<,  its  norm  is  a  most  natural 
criterion  for  measuring  how  well  the  ith  column  of  a  matrix  can  be  approximated  by  the  other 
columns.  As  mentioned  in  Section  2.4,  Stewart  already  proved  the  relationship 

||a,'||  =  min  ||  A,  y  —  a«||  =  jp-jj. 


The  following  corollary  is  needed  in  the  derivation  of  the  component-wise  error  for  least  squares 
problems  in  Section  4.1.  It  states  that  the  length  of  a  row  in  (ArA)->  is  greater  than  the  square  of 
the  length  of  the  corresponding  row  in  A*  (they  are  equal  when  A  is  non-singular). 


Corollary  6  If  A  is  a  nxm  matrix  of  rankm,  rf  =  ej  .4* ,  and  qj  =  ef(ATA)~l  then  ||<7,j|  >  ||r,j|2. 


Proof:  Let 

■4/>  =  «(o) 
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be  the  QR  decomposition  of  A  with  column  permutations,  where  Pisamxm  permutation  matrix, 
Q  is  a  n  x  n  orthogonal  matrix,  and  iiisamxm  non-singular  upper  triangular  matrix.  If  =  Pe  j 
then 

qj  =eJ(ATA)~l  =  ej  PT(AT  A)~l  PPT  =  ej  R~l  R~T  PT  =  vj  R~T  PT , 
where  Vj  =  R~Tej.  As  A f  =  (ATA)_1.AT  we  get 

rf  =  q?AT=  vj  R~t  Pt  At  =  ( vf  0)QT. 

Hence  ||i\||  =  ||i>j||  and  |ta||  =  ||R-1»j||. 

For  a  fixed  P  consider  the  index  i  for  which  j  =  m.  The  upper  triangular  structure  of  R 
implies  that  vm  =  R~Tem  =  jCm.  where  p  is  the  element  of  R  in  position  (m,  m).  So  ||r,||  =  \/\p\. 
Substituting  vm  =  ±em  in  <7,  yields  =  PR~1vm  =  jPR~lem.  Hence  ||^||  >  1/p2  and  ||g,||  >  ||r,||2. 

In  order  to  prove  the  corollary  for  all  rows,  choose  a  sequence  of  permutation  matrices  such  that 
ej  =  P^m  for  1  <  »  <  m.  ■ 


Remark  2  The  angles  a,-  associated  with  the  columns  of  A  are  equal  to  the  corresponding  angles 
associated  with  the  rows  of  (AT A)~{  AT . 

To  see  why,  recall  that  is  the  angle  between  a  j  and  r*,  where  r*  constitute  the  columns  of  B  — 
[(ATA)~lAT]  ,  and  AT  is  the  right-inverse  of  B  with  rows  af .  Therefore  the  relevant  angles  are 
those  between  r,-  and  a,-,  which  are  just  a,-. 


Appendix  2:  More  Perturbation  Results 


Here  we  derive  perturbation  results  for  linear  systems  that,  unlike  those  in  Section  3,  do  not  contain 
computed  quantities  in  the  error  expressions.  This  is  possible  because  the  perturbations  are  expressed 
differently. 

Let  x  be  the  computed  solution  of  the  linear  system  Ax  =  b  and  the  exact  solution  of  (v4+F)r  =  b. 
In  order  to  assess  in  more  detail  the  effect  of  perturbations  in  the  matrix  on  the  component-wise 
relative  error,  we  distinguish  two  cases  regarding  perturbations  in  the  matrix  A:  perturbations 
confined  to  the  column  corresponding  to  x<,  and  perturbations  of  all  columns  except  for  the  one 
corresponding  to  z,-. 

Denote  by  and  fi  the  respective  columns  of  A  and  A  +  F, 

=  (  al  •  •  ■  <J|  — 1  ®i+l  ■  ■  •  0m  )  i  F  —  (  f\  •  ■  •  fi—  1  fi  fi+l  ■  ■  ■  fm  )  • 

We  distinguish  the  ith  columns  a,  and  /,  from  the  remaining  columns  by  introducing 

Ai  —  ( Ql  ...  fl<_  1  a,  +  i  ...  am  )  ,  Fi  —  (  fl  ...  fi  —  l  fi  +  l  •••  fm  )  • 

As  before,  a,-  denotes  the  angle  between  a,-  and  and  its  projection  on  spanjj-^fa*},  while  /?,•  denotes 
the  angle  between  b  and  the  projection  of  a,  onto  spanj^.fa*}.  Due  to  the  particular  expres¬ 
sions  for  the  perturbations,  the  following  results  are  formulated  in  terms  of  the  projections  a,-  of  a, 
onto  span ^ {a*},  rather  than  in  terms  of  the  rows  rj  of  AK  This  represents  only  a  small  change, 
as  according  to  Corollary  5  in  Appendix  1  ||d,j|  =  ||a* j)  cos  or*  =  l/||r;||. 
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First  consider  the  case  when  only  column  a,  is  perturbed,  that  is,  F,  =  0.  In  Section  6  of  [21] 
Stewart  discusses  this  situation  in  the  context  of  errors  in  regression  variables.  Due  to  his  assump¬ 
tions  with  regard  to  the  statistical  nature  of  the  errors,  it  is  difficult  to  compare  his  and  our  results. 

The  following  lemma,  whose  proof  appears  at  the  end  of  this  section,  applies  to  both  linear 
systems  and  least  squares  problems.  It  shows  that  in  case  of  linear  systems  the  effect  on  x i  of  the 
perturbation  /,  is  making  itself  felt  in  terms  of  its  size  ||/,||  and  in  terms  of  its  distance  to  the 
space  of  the  other  columns.  In  some  sense,  the  effect  of  fi  is  confined  to  column  a,  -  although  the 
direction  of  /,•  itself  is  arbitrary.  The  reason  is  that  the  space  spanned  by  the  remaining  columns  has 
dimension  n  —  1.  So  its  orthogonal  complement  spanj^fa*}  has  dimension  one.  But  both,  ai  and 
/,,  are  projected  into  this  one-dimensional  space  in  order  to  make  up  ij.  Therefore,  the  effect  of  any 
perturbation  in  the  ith  column  is  confined  to  the  one-dimensional  space  span^^a*}.  Consequently, 
unless  ai  +/i  is  zero,  there  is  no  question  about  the  right-hand  side  6  remaining  in  the  column  space 
of  A,  so  cosy?,  does  not  enter  the  relative  error  for 

In  the  case  of  least  squares  problems,  the  effect  of  fi  still  remains  confined  to  column  a,  but  now 
the  relation  between  perturbation  and  right-hand  side  matters  as  well,  and  cos/?,  enters  the  picture. 
The  smaller  the  contribution  of  a,  to  b  outside  the  space  of  the  other  columns,  the  more  the  angle 
between  fi  and  b  matters.  Due  to  the  quadratic  nature  of  the  least  squares  problems  we  get  squared 
condition  numbers  in  front  of  second-order  error  terms. 


Lemma  1  Given  matrices  A  and  A  +  F  of  full  column  rank  with  Fi  =  0,  let  x  solve  minv  \\Ay  —  6|| 
and  x  solve  min,,  ||(.A  +  F)y  —  6||.  Let  <j>i  be  the  angle  between  fi  and  the  projection  ai  of  ai 
onto  span^,- {a*},  and  pt  =  ||/i||/|K||. 


If  Xi  =:  0  then  ij  =  0. 


If  Xi  0  then 


Xi  =  Xi 


1  _L  n  CO»<bt,.COt  a»., 
_ '  cos  a,  cot  (3, _ 

i  +  2  +  p?co,,y/.’ 

c*  comer,  ’  r ,  cos J  a , 


*i  ~  *i 
Xi 


Pi 


2  cos  d>i  +  Pi-~*J‘'  - 


_ 1  COS  1 1>b,, 

cosVfs-EZ^? 


COSQ ,  1  +  2pi^±L  +  p?coga*-f'' 

'  cos  Qt  '  '  *  COS*  a , 


where  <j>h,i  is  the  angle  between  b  and  the  projection  fi  of  fi  onto  span^^a*},  and  d>s,i  is  the  angle 
between  fi  and  fi. 


If  Xi  ^  0  and  A  is  non-singular  then 

1 


Xi  —  x 


*  1  +  ^5 ±ipi  ’ 

1  COSO,"* 


Xi  -  Xi 
Xi 


Pi 


cos  4>i 


cos  a,-  1  +  S21±l p. ' 

'  cosa,^* 


The  relative  perturbation  pi  cos  <j>i  associated  with  a  linear  system  is  amplified  by  the  factor 
1/ cosaj,  which  is  independent  of  the  righthand  side  in  contrast  to  Corollary  1.  Therefore,  the  more 
the  corresponding  column  a,  lies  in  the  space  spant^,{a*}  spanned  by  the  remaining  columns,  the 
larger  the  relative  error  in  the  ith  component  of  x. 

With  the  abbreviations  k,  =  1/cosa,  and  p\  =  —  pi  cos d>i,  the  above  component-wise  relative 
error  for  non-singular  linear  systems  can  be  written  as 

Xi  -  Xj  _  Kjpcj 
Xi  1  -  KiPi  ’ 
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which  closely  resembles  the  norm-based  relative  error  for  the  perturbed  system  ( A  4-  F)x  =  6 

\\x-x\\ 


<  <A)p{A)  liFJl 

—  1  _/  A\-t  A\  »  P\A) 


r||  "I  -k{A)P{A)' 


U\\ 


Now  consider  the  case  when  all  columns,  except  for  the  one  associated  with  Xi,  are  perturbed, 
that  is  fi  =  0.  Given  the  original  linear  system  Ax  =  6,  where  A  is  a  n  x  n  non-singular  matrix,  let 
the  perturbed  system  (A- f  F)x  =  b+  f  satisfy  ||A_1||  ||F||  <  1.  This  means  that  A  +  F  is  also  non¬ 
singular  because  A  +  F  =  A(I  +  A~lF),  and  I  +  A~lF  is  non-singular  if  ||A-1F||  <  ||A-1||  ||F||  <  1 
[12],  Lemma  2.3.3.  The  proof  of  the  following  lemma  appears  at  the  end  of  this  section. 


Lemma  2  Let  Ax  =  b  and  ( A  +  F)x  =  b  with  fi  =  0,  and  ||A-l||  ||F||  <  1. 
Define  the  matrix 


where 


Ai  =  Fi  (/  +  (Aj Ai)~lAj Fi)~l  (A?  Ai)~1Af , 

HA  II  <  <A)PA  n  -  i!£ii 

11  111  -  1  -  «(A)Pa  '  Pa~\\A\\' 


and  the  related  relative  errors 


Pa,~  lla,{|  ’  Pi'l~  11611  ‘ 


Let  d>a,i  be  the  angle  between  A  iOi  and  the  projection  a,  of  a,-  onto  span^,{a*},  and  <f>b,i  the  angle 
between  A, 6  and  a,. 


If  Xi  jb  0  then 


aj (/  -  A<)6  1  -  ■r.r  PM  Xi  -  Xi  coma 

Xi  =  — -  =  *-• - 


aT(i-Ai)ai  x\.s^Pa: 


CQ«»...  comQm,x 

,  P«,i  co mp. 


If  H  =0  then 


Xi  =  - 


||A,t||  co»0t,. 
||o,  co«a, 


1  _ 

1  com  OpA*"'1 


*  i  _  II  A.  II  C om4>,.,  ■ 
l|a,ll  coa  or, 


The  vectors  A an  '  A, 6  are  in  the  column  space  of  F,.  If  the  column  space  of  F<  is  a  subset  of 
the  column  space  of  .4,  -hen  F,  is  orthogonal  to  span^,  {o*},  and  cos  and  cos  <f>b,i  are  zero,  which 
implies  it  =  ij.  The  angles  d>a,i  and  4>b,i  are  bounded  above  by  the  angle  between  span^,{at}  and 
the  column  space  of  F*.  As  before,  the  relative  error  in  the  ith  component  of  x  is  the  larger  the  more 
the  right-hand  side  6  and  the  corresponding  column  ai  lie  in  the  space  spanned  by  the  remaining 
columns. 


Proof  of  Lemma  1 


From  Theorem  1  and  Corollary  5  we  have 


(gj  +  fi  )T6 
(d,  +  /<)T(a, +/,•)’ 
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where  ft  and  f  are  the  respective  projections  of  ft  and  fi  onto  span^fat}. 

We  first  prove  the  easier  case  when  A  is  non-singular  and  span^,{at}  has  dimension  one.  So  /, 
must  be  a  multiple  of  ft,  i.e.  fi  =  A ft  for  some  real  number  A.  Therefore 

_  (l  +  AMia^H  ||6||  co»A  _  INI  11*11  C08  ft 
(1  +  A)af(ft  +  fi)  of  (ai  +  fi) 

where  ft  is  the  angle  between  ft-  and  b. 


If  <t>i  is  the  angle  between  fi  and  ft ,  then  the  denominator  of  ft  equals 


ft  (ft  +  /•)  =  ||ft||  ||ft||cosa<  +  ||di 


cos  ft. 


Therefore, 


Xi  = 


I  cos  ft 


Hft  ll  cos  a,-  +  ||/i||  cos  ft 


=  Xf 


1  + 


since  ar,-  =  ||6||cosft/(||ft||co8aj)  by  Theorem  1  and  Corollary  5. 


In  the  general  case  when  span 4^,. {a*}  may  have  dimension  greater  than  one,  the  projection  /,  is 
not  a  multiple  of  a, .  So, 

m 


1  + 


a  n 


X  i—X 


'  1  4.  2  “ll  +  LLLl 

"*  4 a f '  oTal 


Denote  by  ft*  the  angle  between  f  and  6,  and  by  ft,,-  the  angle  between  /,  and  /,-.  So, 


fjb  =  ||/<||||ft||cosft,<,  fjfi  =  ||/7ll  II/. II  cos  ft,,-. 

From  Corollary  5  we  know  that  ||ftj|  =  ||a,j|  cosa,-.  In  a  similar  fashion  we  can  show  that  ||/j||  = 
H/ill  cos  ft,,-.  This  gives  the  expression  for  the  relative  error  in  the  general  case. 


Clearly  ft  =  0  whenever  r,  =  0. 


Proof  of  Lemma  2 

Similar  to  the  proofs  in  Appendix  1,  it  suffices  to  show  the  statement  for  ft . 

From  Theorem  I  and  Corollary  5  we  know  ft  =  af  Pb/af  Pax,  where 

7  =  /  -  (Ai  +  Fx)  ((Ax  +  Fx)t(Ax  +  Fx ))"*  (ft  +  ftf. 

We  consider  numerator  and  denominator  of  ft  separately. 

In  order  to  get  rid  of  the  inverse  inside  the  projector  P  we  would  like  to  apply  the  Sherman- 
Morrison- Woodbury  formula  [12],  page  51,  in  such  a  way  that  the  inverse  of  the  matrix  sum  is  a 
scalar.  This  is  possible  if  the  sum  consists  of  rank-one  matrices. 

To  this  end  decompose  Fi  into  its  components  in  spanj^fa*}  and  span*.^ {a*}, 

Fx  =  Ax(AfAx)-lAfFx  +  Fx,  where  F,  =  (/  -  Ax(Af  Ax)~lAf)Fx. 
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Hence 


Ax  +Fi  =  AiZ  +  Fy,  where  Z  =  /  +  (Aj  At)~l  Aj  Fu 

and 

(Ax  +  FOt(Ai  +  Fx)  =  (AXZ  +  Fx)t(AxZ  +  Ft)  =  ZtaJAxZ  +  F?Fx 

since  Af  Fx  =  0.  By  means  of  the  singular  value  decomposition  of  A\  and  the  interlacing  of  singular 
values  of  Ax  and  A,  [12],  Section  8.3.1,  we  can  show  that  ||(j4f  -i4x)_MfFi||  <  ||yl_1||  ||F||  <  1,  so 
by  Lemma  2.3.3  in  [12]  Z  is  non-singular.  Thus, 

P  =  I  -  (Ax  +  FiZ~l)(Aj Ax  +  Z~T F? FiZ-l)~l(Ax  +  FiZ~l)T . 

As  each  column  of  Fi  is  in  spanjh^a*},  and  span^^a*}  has  dimension  one,  A  has  rank  one 
and  we  can  write  Fx  =  dx  f1  for  some  (n  —  1)  x  1  vector  /  (the  fact  that,  in  case  of  least  squares 
problems,  the  rank  of  F  exceeds  one  has  sofar  prevented  us  from  proving  this  theorem  in  the  more 
general  case).  Now  we  can  apply  the  Sherman-Morrison- Woodbury  formula  to 

(Ay  Ax  +  Z~T  Ff  F\Z~*)~X  =  (Af  Ax  +  afax Z  TffTZ  l)  1 
=  (  A?  Ax  r1  -  if  ax  (ATAx)-1Z~Tf(l  +  a?  ax  fT  Z~'  (Af  Ax  )-1  Z~T  f)~'  fT  Z~'  (Af  Ax)’1 


Distinguishing  the  scalars 


a  =  afai  =  afax, 


C  =  fTZ-\AfAx)-xZ-Tf 


P  =  I-(Ax+  FxZ~l)  ((Af At)-1  -  T^(AjAx)-1Z-TffTZ-l(AfAxr1^  (Ax  +  F1^1)T- 
Multiply  the  terms  in  the  product  and  use  y  —  Ax(Af  Ay)~l Z~T f  to  get 

~P  =  I  —  ^(.Af-Ai)-1^  -f  (di/ -  <ryyT  +yaf  +0 Mf)^  . 

The  numerator  of  iy  equals 

°fPb  =  J^^fi  -  °vTh)  =  -  F,Z-'(AjA1)-,Ar)i  =  if  (I  -  A.)i, 

where  At  =  FyZ~1(Af  Ay)~lAf .  Similarly,  the  denominator  equals 

afPax  =  1  aif(afax  -  <ryT ay )  =  l  ~  FxZ~l(Af  Ax)~l  Af)a{  =  df(I  -  Ax )ay- 

1  +  <?(,  l  + 

Therefore  _ 

.  df(7  -  At)6 

Xl  ~  «f(7  -  Ai)ai  ’ 

which  represents  the  first  equality  for  zx  in  the  statement  of  the  theorem. 

Let  0x  be  the  angle  between  di  and  b,  and  be  the  angle  between  di  and  Ayb.  The  numerator 
of  x  i  satisfies 


afb~a.fA\b  =  ||di||  ||6||cos/?i  -  ||di||  IjAiftij  cosfo.i  = 


cos 0x  -  !|Ai&||  cos^  i). 
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Similarly,  with  aj  being  the  angle  between  di  and  ai  and  with  4>a,  1  being  the  angle  between  dj 
and  Ajai,  the  denominator  satisfies 

ajai  -  afAjai  =  Hd!!!  ||aij|  coscn  -  HdxH  HAiaxlj  cos^a.i  =  11^1  [[ cosori  -  ||Aiai||cos<£ail). 


Hence, 


If  xi  =  0  then 


_  _  ||6||  cos  fix  -  ||Ai&||cosi£m 

1  ~  IMI  cosai  -  IIAtaiH  cos<£a>i  ' 

.  _ _ II Ai6ff  cos  <fo,i _ 

1  |jai|| cosai -||Aiai || cos <^a,i’ 


and  otherwise 


xt  = 


||6||cos/?i 

llaill  cosai  i  _  lift-1*;  11  c.9a.^1 

II  a  i||  co«ai 


1  II A  i  >11  eo»  <t>b,  i 

_  _  11*11  COS/?l 

1  1  IIAifli  ||  co»<<>,,i 

*  II O  ill  co«or, 


by  Theorem  1  and  Corollary  5.  This  establishes  the  second  equality  for  x\  in  the  statement  of  the 
theorem  and  the  expression  for  the  relative  error  in  x\. 
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